Technical FAQs

Frequently Asked Questions by Unified Forecast System (UFS) Community Users

UFS code is portable to Linux and Mac operating systems that use Intel or GNU compilers. The code has been tested on a variety of platforms widely used by atmospheric scientists, including NOAA Research & Development HPC Systems (e.g., Hera, Jet, and Gaea); the National Center for Atmospheric Research (NCAR) system, Derecho; and various Mac laptops. EPIC also provides support for running UFS applications in Singularity/Apptainer containers. These containers come with pre-built UFS code and dependencies and can be used on any platform that supports Singularity/Apptainer. Additionally, EPIC supports the use of the UFS on major commercial Cloud Service Providers (CSPs) including Google Cloud, Amazon Web Services (AWS), and Microsoft Azure. Limited support is also available for other non-commercial high-performance computing platforms. 

Currently, there are four levels of supported platforms. On preconfigured (Level 1) platforms, the UFS code is expected to build and run out-of-the-box. On configurable (Level 2) platforms, the prerequisite software libraries are expected to install successfully, but they are not available in a central location. Applications and models are expected to build and run without issue once the prerequisite libraries have been built. Limited-test (Level 3) and build-only (Level 4) platforms are platforms where developers have built the code but little or no pre-release testing has been conducted, respectively. View a complete description of the levels of support for more information. Individual applications and components support different subsets of machines. However, the UFS Weather Model provides regression testing support for a standard set of systems

UFS code is provided free of charge under a variety of open-source licenses (see, e.g., the UFS Weather Model license). The computing power required to run certain applications may require access to a high-performance computing (HPC) system or a cloud-based system. Users can explore EPIC’s Cloud Cost Estimator tool on the Getting Started page to determine approximate costs for running the Short-Range Weather (SRW) Application on the cloud using Amazon Web Services (AWS). Cost information for additional cloud service providers and UFS applications will be added in the future.

UFS Code consists of several models, applications, and components, each with its own GitHub repository and accompanying set of documentation. UFS code is publicly available on GitHub

It is recommended that new users start with one of the latest application releases:

  • UFS Short-Range Weather (SRW) App v2.2.0

    Release Date: 10/31/2023
    Release Description: The SRW App v2.2.0 is an update to the v2.1.0 release from November 2022 and reflects a number of changes currently available in the SRW App develop branch. The Application is designed for short-range (up to two days) regional forecasts located anywhere on the globe. It includes a prognostic atmospheric model, pre- and post-processing, and a community workflow for running the system end-to-end. These components are documented within this User’s Guide and supported through GitHub Discussions. Key feature updates for this release include the addition of new supported platforms (i.e., Derecho, Hercules, Gaea C5), the transition to spack-stack modulefiles for most supported platforms to align with the UFS WM shift to spack-stack, and the addition of the supported FV3_RAP physics suite and support for the RRFS_NA_13km predefined grid. A comprehensive list of updates is documented in the GitHub release.
    Documentation: SRW App User’s Guide v2.2.0

  • UFS offline Land Data Assimilation (DA) System v1.2.0

    Release Date: 2023-12-11
    Release Description: The Land DA System v1.2.0 is an update to the version 1.1.0 release from May 2023 and reflects a number of changes currently available in the land-DA_workflow development branch. In the Land DA System, the Noah-MP land surface model (LSM) in the UFS Weather Model (WM) and the Joint Effort for Data assimilation Integration Joint Effort for Data assimilation Integration (JEDI) system are used to assimilate snow depth data via the Local Ensemble Transform Kalman Filter-Optimal Interpolation (LETKF-OI) algorithm. Updates for this release include integration of the UFS Noah-MP land component into the Land DA System, updates to model forcing options, CTest suite upgrades, and an upgrade of the JEDI DA system to JEDI Skylab v4.0. A comprehensive list of updates is documented in the GitHub release.
    Documentation: Land DA System User’s Guide v1.2.0

In general, applications contain a full set of pre- and post-processing utilities packaged with the UFS Weather Model. They also include documentation for users to get started with the application. The repository wiki often contains additional information, such as code contribution requirements. 

Other repositories and documentation for UFS Code can be found at the following locations:

The ufs-community GitHub Discussions page is a great place to post general questions about the UFS. When questions are specific to a particular application, model, or component, it is best to post questions directly to those repositories. Repositories with EPIC-supported GitHub Discussions include: 

 

When a repository does not include an EPIC-supported Discussions feature, users may post their questions on the ufs-community GitHub Discussions page instead. 

Users can also check out the tutorials available on the EPIC website at: https://epic.noaa.gov/tutorials/

UFS models and applications require a number of software libraries in order to compile. These libraries are conveniently available in a bundle via spack-stack

The software stack managed by spack-stack contains two categories of libraries: 

  1. Bundled libraries (NCEPLIBS). These are libraries developed for use with NOAA weather models. 
  2. Third-party libraries (NCEPLIBS-external). These are libraries that were developed external to the UFS Weather Model. They are general software packages that are also used by other models in the community.

In August 2023, the UFS WM switched over to spack-stack from HPC-Stack. Other UFS components and applications are in the process of making this switch. Users may still use spack-stack’s predecessor, HPC-Stack (documentation here), but once all components and applications have switched to spack-stack, support for HPC-Stack will be limited and, eventually, deprecated. Users are encouraged to switch to spack-stack as soon as possible. 

The CCPP team updated its technical documentation ahead of the Short-Range Weather (SRW) Application v2.2.0 release (October 2023).

Users may also view technical documentation for CCPP v6.0.0, which is the most recent standalone CCPP release (June 2022).

CCPP continues to add new developments to the main development branch of its repositories, and these are captured in the latest CCPP technical documentation. Users should know that this documentation may have gaps or errors, since the repository is in active development.

The CCPP team updated its scientific documentation ahead of the Short-Range Weather (SRW) Application v2.2.0 release (October 2023). 

Users may also view scientific documentation for CCPP v6.0.0, which is the most recent standalone CCPP release (June 2022).

CCPP continues to add new developments to the main development branch of its repositories. The HEAD of the CCPP repositories is therefore slightly ahead of the scientific documentation.

Each UFS repository maintains its own documentation and/or wiki. Additionally, UFS models and applications use a variety of components, such as pre- and post-processing utilities and physics suites. These components also maintain their own repository, documentation, and (if applicable) wiki page. Users can often view different versions of documentation by clicking on the caret symbol at the bottom left or right of any documentation hosted on Read the Docs.

Read the Docs link sample image

Users may find the following links helpful as they explore UFS Code:

  • UFS_UTILS Preprocessing Utilities

Other Helpful Links:

UFS Weather Model (WM) Questions

The UFS Weather Model (WM) is constantly evolving, and new features are added at a rapid pace. Users can find those features in the develop branch, but documentation is not always available for the latest updates. The UFS WM is tagged frequently for public and operational releases. The ufs-srw-v2.2.0 tag of the WM is the most recent public release of the UFS WM, which was released as part of the UFS Short-Range Weather (SRW) Application v2.2.0. This tag represents a snapshot of a continuously evolving system undergoing open development. 

Since the UFS WM contains a huge number of components (e.g., dynamical core, physics, ocean coupling, infrastructure), there have been a wide variety of updates since the SRW App v2.1.0 tag. Updates include:

Previous releases of the UFS WM are available, but we recommend using the UFS WM within an application workflow (e.g., SRW App v2.2.0). Alternatively, users can run the develop branch code to check out the latest and greatest features! This code is constantly maintained via regression testing. Users can access information about previous releases of the UFS WM in the User’s Guide for each release:

Users will need to:

  1. Update input.nml by setting ldiag3d and qdiag3d to .true.
  2. Update the diag_table according to the instructions in the UFS WM documentation.

 

Although it may seem counterintuitive, the physics tendencies will be output in sfc*.nc files once the diag_table changes have been made. Even 3D fields will appear there. 

Users may find the following GitHub Discussions on this topic informative: 

 

To output a particular variable from FV3, users must update the field section of the diag_table file, which specifies the fields to be output at run time. Only fields registered with register_diag_field(), which is an API in the FMS diag_manager routine, can be used in the diag_table. A line in the field section of the diag_table file contains eight variables with the following format:

"module_name", "field_name", "output_name", "file_name", "time_sampling", "reduction_method", "regional_section", packing

These variables are defined in Table 4.25 of the UFS WM documentation on the diag_table file. 

For example, to output accumulated precipitation, the following line must appear in the diag_table file: 

"gfs_phys", "totprcp_ave", "prate_ave", "fv3_history2d", "all", .false., "none", 2

Users may refer to diag_table examples in the UFS WM repository. These files are used to configure groups of regression tests. 

View GitHub Discussion #2016 for the question that inspired this FAQ. 

UFS Short-Range Weather (SRW) Application (App) Questions

At this time, there are ten physics suites available in the SRW App, five of which are fully supported. However, several additional physics schemes are available in the UFS Weather Model (WM) and can be enabled in the SRW App. Note that when users enable new physics schemes in the SRW App, they are using untested and unverified combinations of physics, which can lead to unexpected and/or poor results. It is recommended that users run experiments only with the supported physics suites and physics schemes unless they have an excellent understanding of how these physics schemes work and a specific research purpose in mind for making such changes. 

To enable an additional physics scheme, such as the YSU PBL scheme, users may need to modify ufs-srweather-app/parm/FV3.input.yml. This is necessary when the namelist has a logical variable corresponding to the desired physics scheme. In this case, it should be set to True for the physics scheme they would like to use (e.g., do_ysu = True). 

It may be necessary to disable another physics scheme, too. For example, when using the YSU PBL scheme, users should disable the default SATMEDMF PBL scheme (satmedmfvdifq) by setting the satmedmf variable to False in the FV3.input.yml file. 

It may also be necessary to add or subtract interstitial schemes, so that the communication between schemes and between schemes and the host model is in order. For example, it is necessary that the connections between clouds and radiation are correctly established.

Regardless, users will need to modify the suite definition file (SDF) and recompile the code. For example, to activate the YSU PBL scheme, users should replace the line <scheme>satmedmfvdifq</scheme> with <scheme>ysuvdif</scheme> and recompile the code.

Depending on the scheme, additional changes to the SDF (e.g., to add, remove, or change interstitial schemes) and to the namelist (to include scheme-specific tuning parameters) may be required. Users are encouraged to reach out on GitHub Discussions to find out more from subject matter experts about recommendations for the specific scheme they want to implement. 

After making appropriate changes to the SDF and namelist files, users must ensure that they are using the same physics suite in their config.yaml file as the one they modified in FV3.input.yml. Then, run the generate_FV3LAM_wflow.py script to generate an experiment and navigate to the experiment directory. Users should see do_ysu = .true. in the namelist file (or similar, depending on the physics scheme selected), which indicates that the YSU PBL scheme is enabled.

You can change default parameters for a workflow task by setting them to a new value in the rocoto: tasks: section of the config.yaml file. First, be sure that the task you want to change is part of the default workflow or included under taskgroups: in the rocoto: tasks: section of config.yaml. For instructions on how to add a task to the workflow, see this FAQ

Once you verify that the task you want to modify is included in your workflow, you can configure the task by adding it to the rocoto: tasks: section of config.yaml. Users should refer to the YAML file where the task is defined to see how to structure the modifications (these YAML files reside in ufs-srweather-app/parm/wflow). For example, to change the wall clock time from 15 to 20 minutes for the run_post_mem###_f### tasks, users would look at post.yaml, where the post-processing tasks are defined. Formatting for tasks and metatasks should match the structure in this YAML file exactly. 

SRW FAQ: How can I change the default parameters (e.g., walltime) for workflow tasks

Excerpt of post.yaml

Since the run_post_mem###_f### task in post.yaml comes under metatask_run_ens_post and metatask_run_post_mem#mem#_all_fhrs, all of these tasks and metatasks must be included under rocoto: tasks: before defining the walltime variable. Therefore, to change the walltime from 15 to 20 minutes, the rocoto: tasks: section should look like this:

rocoto:

  tasks:

    metatask_run_ens_post:

      metatask_run_post_mem#mem#_all_fhrs:

        task_run_post_mem#mem#_f#fhr#:

          walltime: 00:20:00

Notice that this section contains all three of the tasks/metatasks highlighted in yellow above and lists the walltime where the details of the task begin. While users may simply adjust the walltime variable in post.yaml, learning to make these changes in config.yaml allows for greater flexibility in experiment configuration. Users can modify a single file (config.yaml), rather than (potentially) several workflow YAML files, and can account for differences between experiments instead of hard-coding a single value. 

See SRW Discussion #990 for the question that inspired this FAQ. 

The predefined grids included with the SRW App are configured to run with 65 levels by default. However, advanced users may wish to vary the number of vertical levels in the grids they are using, and documentation has recently been added explaining how to do this! Users can check out the Limited Area Model Grid chapter for instructions. 

In general, there are two options for using more compute power: (1) increase the number of PEs or (2) enable more threads.

Increase Number of PEs

PEs are processing elements, which correspond to the number of MPI processes/tasks. In the SRW App, PE_MEMBER01 is the number of MPI processes required by the forecast. It is calculated by LAYOUT_X * LAYOUT_Y + WRTCMP_write_groups * WRTCMP_write_tasks_per_group when QUILTING is true. Since these variables are connected, it is recommended that users consider how many processors they want to use to run the forecast model and work backwards to determine the other values.

For simplicity, it is often best to set WRTCMP_write_groups to 1. It may be necessary to increase this number in cases where a single write group cannot finish writing its output before the model is ready to write again. This occurs when the model produces output at very short time intervals.

The WRTCMP_write_tasks_per_group value will depend on domain (i.e., grid) size. This means that a larger domain would require a higher value, while a smaller domain would likely require less than 5 tasks per group.

The LAYOUT_X and LAYOUT_Y variables are the number of MPI tasks to use in the horizontal x and y directions of the regional grid when running the forecast model. Note that the LAYOUT_X and LAYOUT_Y variables only affect the number of MPI tasks used to compute the forecast, not resolution of the grid. The larger these values are, the more work is involved when generating a forecast. That work can be spread out over more MPI processes to increase the speed, but this requires more computational resources. There is a limit where adding more MPI processes will no longer increase the speed at which the forecast completes, but the UFS scales well into the thousands of MPI processes.

Users can take a look at the SRW App predefined grids to get a better sense of what values to use for different types of grids. The Computational Parameters and Write Component Parameters sections of the SRW App User’s Guide define these variables.

Enable More Threads

In general, enabling more threads offers less increase in performance than doubling the number of PEs. However, it uses less memory and still improves performance. To enable more threading, set OMP_NUM_THREADS_RUN_FCST to a higher number (e.g., 2 or 4). When increasing the value, it must be a factor of the number of cores/CPUs (the number of MPI tasks * OMP threads cannot exceed the number of cores per node). Typically, it is best not to raise this value higher than 4 or 5 because there is a limit to the improvement possible via OpenMP parallelization (compared to MPI parallelization, which is significantly more efficient).

In almost every case, it is best to regenerate the experiment from scratch, even if most of the experiment ran successfully and the modification seems minor. Some variable checks are performed in the workflow generation step, while others are done at runtime. Some settings are changed based on the cycle, and some changes may be incompatible with the output of a previous task. At this time, there is no general way to partially rerun an experiment with different settings, so it is almost always better just to regenerate the experiment from scratch.

The exception to this rule is tasks that failed due to platform reasons (e.g., disk space, incorrect file paths). In these cases, users can refer to the FAQ on how to restart a DEAD task

Users who are insistent on modifying and rerunning an experiment that fails for non-platform reasons would need to modify variables in config.yaml and var_defns.sh at a minimum. Modifications to rocoto_defns.yaml and FV3LAM_wflow.xml may also be necessary. However, even with modifications to all appropriate variables, the task may not run successfully due to task dependencies or other factors mentioned above. If there is a compelling need to make such changes in place (e.g., resource shortage for expensive experiments), users are encouraged to reach out via GitHub Discussions for advice. 

See SRW Discussion #995 for the question that inspired this FAQ.

If you encounter issues while generating ICS and LBCS for a predefined 3-km grid using the UFS SRW App, there are a number of troubleshooting options. The first step is always to check the log files for a failed task. This file will provide information on what went wrong. A log file for each task appears in the log subdirectory of the experiment directory (e.g., $EXPTDIR/log/make_ics). 

Additionally, users can try increasing the number of processors or the wallclock time requested for the jobs. Sometimes jobs may fail without errors because the process is cut short. 

Users can also update the hash of UFS_UTILS in the Externals.cfg file to the HEAD of that repository. There was a known memory issue with how chgres_cube was handling regridding of the 3-D wind field for large domains at high resolutions (see ufs-community/UFS_UTILS#766 and the associated issue for more information).

When users make changes to one of the SRW App executables, they can rerun the devbuild.sh script using the command ./devbuild.sh --platform=<machine_name>. This will eventually bring up three options: [R]emove, [C]ontinue, or [Q]uit.

The Continue option will recompile the modified routines and rebuild only the affected executables. The Remove option provides a clean build; it completely removes the existing build directory and rebuilds all executables from scratch instead of reusing the existing build where possible. The build log files for the CMake and Make step will appear in ufs-srweather-app/build/log.cmake and ufs-srweather-app/build/log.make; any errors encountered should be detailed in those files.

Users should note that the Continue option may not work as expected for changes to CCPP because the ccpp_prebuild.py script will not be rerun. It is typically best to recompile the model entirely in this case by selecting the Remove option for a clean build.

A convenience script, devclean.sh, is also available. This script can be used to remove build artifacts in cases where something goes wrong with the build or where changes have been made to the source code and the executables need to be rebuilt. Users can run this script by entering either ./devclean.sh --clean or ./devclean.sh -a. Following this step, they can rerun the devbuild.sh script to rebuild the SRW App. Running ./devclean.sh -h will list additional options available. 

See SRW Discussion #1007 for the question that inspired this FAQ.

Unified Post Processor (UPP) Questions

Users can find answers to many frequently asked UPP questions by referencing the UPP User’s Guide. Additional questions are included below.

The UPP is compatible with NetCDF4 when used on UFS model output.

We are not able to support all platform and compiler combinations out there but will try to help with specific issues when able. Users may request support on the UPP GitHub Discussions page. We always welcome and are grateful for user-contributed configurations.

Currently, the stand-alone release of the UPP can be utilized to output satellite fields if desired. The UPP documentation lists the grib2 fields, including satellite fields, produced by the UPP. After selecting which fields to output, the user must adjust the control file according to the instructions in the UPP documentation to output the desired fields. When outputting satellite products, users should note that not all physics options are supported for outputting satellite products. Additionally, for regional runs, users must ensure that the satellite field of view overlaps some part of their domain. 

Most UFS application releases do not currently support this capability, although it is available in the Short-Range Weather (SRW) Application. This SRW App pull request (PR) added the option for users to output satellite fields using the SRW App. The capability is documented in the SRW App User’s Guide.

If the desired variable is already available in the UPP code, then the user can simply add that variable to the postcntrl.xml file and remake the postxconfig-NT.txt file that the UPP reads. Please note that some variables may be dependent on the model and/or physics used.

If the desired variable is not already available in the UPP code, it can be added following the instructions for adding a new variable in the UPP User’s Guide.

There are a few possible reasons why a requested variable might not appear in the UPP output:

  1. The variable may be dependent on the model. 
  2. Certain variables are dependent on the model configuration. For example, if a variable depends on a particular physics suite, it may not appear in the output when a different physics suite is used. 
  3. The requested variable may depend on output from a different field that was not included in the model output.

If the user suspects that the UPP failed (e.g., no UPP output was produced or console output includes an error message like mv: cannot stat `GFSPRS.GrbF00`: No such file or directory), the best way to diagnose the issue is to consult the UPP runtime log file for errors. When using the standalone UPP with the run_upp script, this log file will be located in the postprd directory under the name upp.fHHH.out, where HHH refers to the 3-digit forecast hour being processed. When the UPP is used with the SRW App, the UPP log files can be found in the experiment directory under log/run_post_fHHH.log.

UPP output is in standard grib2 format and can be interpolated to another grid using the third-party utility wgrib2. Some basic examples can also be found in the UPP User’s Guide.

This may be a memory issue; try increasing the number of CPUs or spreading them out across nodes (e.g., increase ptiles). We also know of one version of MPI (mpich v3.0.4) that does not work with UPP. A work-around was found by modifying the UPP/sorc/ncep_post.fd/WRFPOST.f routine to change all unit 5 references (which is standard I/O) to unit 4 instead.

For re-gridding grib2 unipost output, the wgrib2 utility can be used. See complete documentation on grid specification with examples of re-gridding for all available grid definitions. The Regridding section of the UPP User’s Guide also gives examples (including an example from operations) of using wgrib2 to interpolate to various common grids.

This warning appears for some platforms/compilers because a call in the nemsio library is never used or referenced for a serial build. This is just a warning and should not hinder a successful build of UPP or negatively impact your UPP run.

This error message is displayed when using more recent versions of the wgrib2 utility on files for forecast hour zero that contain accumulated or time-averaged fields. This is due to the newer versions of wgrib2 no longer allowing for the n parameter to be zero or empty. 

Users should consider using a separate control file (e.g., postcntrl_gfs_f00.xml) for forecast hour zero that does not include accumulated or time-averaged fields, since they are zero anyway. Users can also continue to use an older version of wgrib2; v2.0.4 is the latest known version that does not result in this error.