Recently a bug discovered by the UK Met Office and resolved by the JEDI team fixed an issue with NASA’s JEDI system, definitively proving the advantages of generic code and interagency cooperation.
The bug was an issue with the halo updates in the generic code. Halos represent surrounding grid squares filled with data sent by other processors and are a convenient way to provide access to remote data in large parallel applications that run on multiple processors. Marek Wlasak, at the UK Met Office, recently noticed that there were inconsistencies in how the halos were updated, and reported that in some cases (in particular in the background error covariance application) the halos were being double-counted. He reached out to JCSDA and François Hébert, working with the JEDI team, figured out how to eliminate the issue and pushed the new code with the bugfix to all of our partners.
When the fix went out Ricardo Todling, at NASA GMAO, discovered that it resolved a separate issue he had been looking into with unexplained minimizer results when using GSI background error covariances in JEDI. As soon as the fix was deployed, the results became explainable. This fix also benefits our NOAA partners working on JEDI for UFS, accelerating JEDI towards operational acceptance.
“With multiple users and multiple models using generic code, you cover more use cases and implementations and you can find bugs more quickly,” said Anna Shlyaeva, head of the JEDI algorithms team. “With generic code, bug fixes in one place benefit everybody else.” That certainly proved true in this instance, where one agency found the bug, one fixed it, and another discovered that the fix resolved an entirely different issue!
For more information, see Ricardo Todling’s in-depth report here.
Photo by Miha Rekar on Unsplash