1. Introduction

1.1. Document Scope and Assumptions

This document provides an overview and introduction to the use of the HPE Cray EX, Warhawk, enabling the average user to perform computational tasks on the system. To receive the most benefit from the information provided here, you should be proficient in the following areas:

  • Use of the UNIX operating system
  • Use of an editor (e.g., vi or emacs)
  • Remote usage of computer systems via network or modem access
  • A selected programming language and its related tools and libraries

1.2. Policies to Review

All policies are discussed in the AFRL DSRC Introductory Site Guide. All users running at the AFRL DSRC are expected to know, understand, and follow the policies discussed. If you have any questions about AFRL DSRC's policies, please contact the HPC Help Desk.

1.3. Obtaining an Account

The process of getting an account on the HPC systems at any of the DSRCs begins with getting an account on the HPCMP Portal to the Information Environment (pIE), commonly called a "pIE Account." If you do not yet have a pIE Account, please visit HPC Centers: Obtaining an Account and follow the instructions there. If you need assistance with any part of this process, please contact the HPC Help Desk at accounts@helpdesk.hpc.mil.

1.4. Requesting Assistance

The HPC Help Desk is available to help users with unclassified problems, issues, or questions. Analysts are on duty 8:00 a.m. - 8:00 p.m. Eastern, Monday - Friday (excluding Federal holidays).

For more detailed contact information, please see our Contact Page.

2. System Configuration

2.1. System Summary

Warhawk is an HPE Cray EX system. The login and compute nodes are populated with AMD EPYC 7H12 processors clocked at 2.3 GHz. Warhawk uses the Cray Slingshot interconnect in a Dragonfly configuration for its high-speed network for message passing interface (MPI) messages and I/O traffic. Warhawk uses Lustre to manage its parallel file system that targets the disk RAID arrays.

Warhawk has 1,092 compute nodes that share memory only on the node; memory is not shared across the nodes.

Each standard compute node has two 64-core processors (128 cores) sharing 512 GB of DDR4 memory, with no user-accessible swap space.

Each large-memory compute node has two 64-core processors (128 cores) sharing 1 TB of DDR4 memory, with no user-accessible swap space.

Each visualization compute node has two 64-core processors (128 cores) and one NVIDIA Tesla V100 GPU sharing 512 GB of DDR4 memory, with no user-accessible swap space.

Each machine-learning accelerated (MLA) compute node has two 64-core processors (128 cores) and two NVIDIA Tesla V100 GPUs sharing 512 GB of DDR4 memory, with no user-accessible swap space.

Warhawk is rated at 5.1 peak PFLOPS and has 24.84 PB (formatted) of parallel disk storage.

Warhawk is intended for use as a batch-scheduled HPC system. Its login nodes are not to be used for large computational (e.g., memory, I/O, long executions) work. All executions that require large amounts of system resources must be sent to the compute nodes by batch job submission.

Node Configuration
Login Standard Large-Memory Visualization Machine-Learning
Total Nodes 7 1,024 4 24 40
Processor AMD 7H12 Rome AMD 7H12 Rome AMD 7H12 Rome AMD 7H12 Rome AMD 7H12 Rome
Processor Speed 2.6 GHz 2.6 GHz 2.6 GHz 2.6 GHz 2.6 GHz
Sockets / Node 2 2 2 2 2
Cores / Node 128 128 128 128 128
Total CPU Cores 896 131,072 512 3,072 5,120
Useable Memory / Node 995 GB 503 GB 995 GB 503 GB 503 GB
Accelerators / Node None None None 1 2
Accelerator n/a n/a n/a NVIDIA V100 PCIe NVIDIA V100 PCIe
Memory / Accelerator n/a n/a n/a 32 GB 32 GB
Storage on Node None None None None None
Interconnect Cray Slingshot Cray Slingshot Cray Slingshot Cray Slingshot Cray Slingshot
OS SLES SLES SLES SLES SLES

File Systems on Warhawk
Path Formatted Capacity File System Type Storage Type User Quota Minimum File Retention
/p/home ($HOME) 1.3 PB Lustre HDD 100 GB None
/p/work1 ($WORKDIR) 16.5 PB Lustre HDD None 21 Days
/p/cwfs ($CENTER) 3.3 PB GPFS HDD 100 TB 120 Days
/p/work1/projects ($PROJECTS_HOME) 16.5 PB Lustre HDD None None

2.2. Processors

Warhawk uses the 2.3-GHz AMD EPYC 7H12 processors on its login, standard-memory, and large-memory compute nodes. There are two processors per node, each with 64 cores, for a total of 128 cores per node. Each processor has a 256-MB L3 cache.

Visualization nodes use the 2.3-GHz AMD EPYC 7H12 processors. There are two processors per node, each with 64 cores, for a total of 128 cores per node. Each processor has a 256-MB L3 cache. Each visualization node has an NVIDIA Tesla V100 GPU with 5,120 CUDA cores and 640 Tensor cores.

Machine-learning accelerated nodes use the 2.3-GHz AMD EPYC 7H12 processors. There are two processors per node, each with 64 cores, for a total of 128 cores per node. Each processor has a 256-MB L3 cache. Each machine-learning accelerated node has two NVIDIA Tesla V100 GPUs each with 5,120 CUDA cores and 640 Tensor cores.

2.3. Memory

Warhawk uses both shared- and distributed-memory models. Memory is shared among all the cores on a node but is not shared among the nodes across the cluster.

Each login node contains 1 TB of main memory. All memory and cores on the node are shared among all users who are logged in. Therefore, users should not use more than 8 GB of memory at any one time.

Each standard compute node contains 503 GB of user-accessible shared memory.

Each large-memory compute node contains 995 GB of user-accessible shared memory.

Each visualization node consists of a standard compute node, paired with an NVIDIA Tesla V100 GPU. The standard compute portion contains 503 GB of user-accessible shared memory that is exclusively available via the GPU, as well as approximately 32 GB of memory on the GPU, itself.

Each machine-learning accelerated node consists of a standard compute node, paired with two NVIDIA Tesla V100 GPUs. The standard compute portion contains 503 GB of user-accessible shared memory that is exclusively available via the GPUs, as well as approximately 32 GB of memory on each of the GPUs.

2.4. Operating System

Warhawk's operating system is SUSE Linux Enterprise Server(SLES).

2.5. File Systems

Warhawk has the following file systems available for user storage:

2.5.1. /p/home

This file system is locally mounted from Warhawk's Lustre file system and has a formatted capacity of 1.3 PB. All users have a home directory located on this file system, which can be referenced by the environment variable $HOME.

2.5.2. /p/work1

This file system is locally mounted from Warhawk's Lustre file system and is tuned for parallel I/O. It has a formatted capacity of 16.5 PB. All users have a work directory located on this file system, which can be referenced by the environment variable $WORKDIR. This file system is not backed up. Users are responsible for making backups of their files to the archive server or to some other local system.

Maintaining the high performance of the Lustre file system is important for the efficient and effective use of Warhawk by all users. You should take steps to ensure your file storage and access methods follow the suggested guidelines as described in the Lustre User Guide.

2.5.3. /p/cwfs

The Center-Wide File System (CWFS) is meant for short-term storage (no longer than 120 days). All users have a directory defined in this file system, which can be referenced by the environment variable $CENTER. This is accessible from both the visualization nodes and the HPC systems login nodes. The CWFS has a formatted capacity of 3.3 PB and is managed by IBM's Spectrum Scale (formerly GPFS).

2.5.4. /p/afrl

The Mass-Storage Archival System (MSAS) is meant for long-term storage of your important data. All users have a directory defined in this file system, which can be referenced by the environment variable $ARCHIVE_HOME. This is accessible from both the transfer nodes and the HPC systems login nodes. The MSAS has 100 TB of Tier 1 archival storage (disk cache) and 6 PB of Tier 2 high-speed archival storage utilizing a robotic tape library.

2.6. Peak Performance

Warhawk is rated at 5.1 peak PFLOPS.

3. Accessing the System

3.1. Kerberos

A Kerberos client kit must be installed on your desktop to enable you to get a Kerberos ticket. Kerberos is a network authentication tool that provides secure communication by using secret cryptographic keys. Only users with a valid HPCMP Kerberos authentication can gain access to Warhawk. More information about installing Kerberos clients on your desktop can be found at HPC Centers: Kerberos & Authentication.

3.2. Logging In

The system host name for the Warhawk cluster is warhawk.afrl.hpc.mil, which will redirect the user to one of seven login nodes. Hostnames and IP addresses to these nodes are available upon request from the HPC Help Desk.

The preferred way to login to Warhawk is via ssh, as follows:

% ssh username@warhawk.afrl.hpc.mil

3.3. File Transfers

File transfers to DSRC systems (except those to the local archive system) must be performed using Kerberized versions of the following tools: scp, ftp, sftp, and mpscp.

The command below uses secure copy (scp) to copy a single local file into a destination directory on a Warhawk login node. The mpscp command is similar to the scp command but has a different underlying means of data transfer and may enable greater transfer rates. The mpscp command has the same syntax as scp.

% scp local_file user@warhawk.afrl.hpc.mil:/target_dir

Both scp and mpscp can be used to send multiple files. This command transfers all files with the .txt extension to the same destination directory.

% scp *.txt user@warhawk.afrl.hpc.mil:/target_dir

The example below uses the secure file transfer protocol (sftp) to connect to Warhawk, then uses the sftp "cd" and "put" commands to change to the destination directory and copy a local file there. The sftp "quit" command ends the sftp session. Use the sftp "help" command to see a list of all sftp commands.

% sftp user@warhawk.afrl.hpc.mil

sftp> cd target_dir
sftp> put local_file
sftp> quit

The Kerberized file transfer protocol (kftp) command differs from sftp in that your username is not specified on the command line but given later when prompted. The kftp command may not be available in all environments.

% kftp warhawk.afrl.hpc.mil

username> user
kftp> cd target_dir
kftp> put local_file
kftp> quit

Windows users may use a graphical file transfer protocol (ftp) client such as FileZilla.

4. User Environment

4.1. User Directories

The following user directories are provided for all users on Warhawk:

4.1.1. Home Directory

When you log into Warhawk, you will be placed in your home directory, /p/home/username. The environment variable $HOME is automatically set for you and refers to this directory. $HOME is visible to both the login and compute nodes and may be used to store small user files. It has an initial quota of 100 GB. $HOME is not intended as permanent storage, but files stored in $HOME are not subject to being purged.

4.1.2. Work Directory

Warhawk has one large file system, /p/work1, for the temporary storage of data files needed for executing programs. You may access your personal working directory by using the $WORKDIR environment variable, which is set for you upon login. Your $WORKDIR directory has an initial quota of 10 TB. Your $WORKDIR and the /p/work1 file system will fill up as jobs run. Please review the File Space Management Policy and be mindful of your disk usage.

REMEMBER: /p/work1 is a "scratch" file system and is not backed up. You are responsible for managing files in your $WORKDIR by backing up files to the MSAS and deleting unneeded files when your jobs end. Please review the Archive User Guide for details.

All of your jobs should execute from your $WORKDIR directory and not $HOME. While not technically forbidden, jobs that are run from $HOME are subject to smaller disk space quotas and have a much greater chance of failing if problems occur with that resource. Jobs that are run entirely from your $WORKDIR directory are more likely to complete, even if all other resources are temporarily unavailable.

Maintaining the high performance of the Lustre file system is important for the efficient and effective use of Warhawk by all users. You should take steps to ensure your file storage and access methods follow the suggested guidelines as described in the Lustre User Guide.

If you use $WORKDIR in your batch scripts, you must be careful to avoid having one job accidentally contaminate the files of another job. One way to avoid this is to use the $JOBDIR (or $WORK_DIR) directory, which is unique to each job on the system. The $JOBDIR directory is not subject to the File Space Management Policy until the job exits the workload management system.

4.1.3. Center Directory

The Center-Wide File System (CWFS) provides file storage that is accessible from Warhawk's login nodes and from the HPC Portal. The CWFS permits file transfers and other file and directory operations from Warhawk using simple Linux commands. Each user has their own directory in the CWFS. The name of your CWFS directory may vary between machines and between centers, but the environment variable $CENTER will always refer to this directory.

The example below shows how to copy a file from the CWFS ($CENTER). While logged into Warhawk, copy your file from your work directory to the CWFS.

% cp $WORKDIR/filename $CENTER
4.1.4. $ARCHIVE_HOME Directory

The Mass-Storage Archival Server (MSAS) provides file storage that is accessible from Warhawk's login and transfer nodes. The MSAS permits file transfers and other file and directory operations from Warhawk using simple Linux commands. Each user has their own directory in the MSAS. The name of your MSAS directory may vary between machines and between centers, but the environment variable $ARCHIVE_HOME will always refer to this directory.

The example below shows how to copy a file from the MSAS ($ARCHIVE_HOME). While logged into Warhawk, copy your file from your work directory to the MSAS.

% cp $WORKDIR/filename $ARCHIVE_HOME

4.2. Shells

The following shells are available on Warhawk: csh, bash, ksh, tcsh, zsh, and sh. To change your default shell, please email a request to require@hpc.mil. Your preferred shell will become your default shell on the Warhawk cluster within 1-2 working days.

4.3. Environment Variables

A number of environment variables are provided by default on all HPCMP HPC systems. We encourage you to use these variables in your scripts where possible. Doing so will help to simplify your scripts and reduce portability issues if you ever need to run those scripts on other systems.

4.3.1. Login Environment Variables

The following environment variables are common to both the login and batch environments:

Common Environment Variables
Variable Description
$ARCHIVE_HOME Your directory on the archive server.
$ARCHIVE_HOST The host name of the archive server.
$BC_HOST The generic (not node specific) name of the system.
$CC The currently selected C compiler. This variable is automatically updated when a new compiler environment is loaded.
$CENTER Your directory on the Center-Wide File System (CWFS).
$CSE_HOME This variable contains the path to the base directory of the default installation of the Computational Science Environment (CSE) installed on a particular compute platform. (See BC policy FY13-01 for CSE details.)
$CXX The currently selected C++ compiler. This variable is automatically updated when a new compiler environment is loaded.
$DAAC_HOME The directory containing DAAC-supported visualization tools: ParaView, VisIt, and EnSight.
$F77 The currently selected Fortran 77 compiler. This variable is automatically updated when a new compiler environment is loaded.
$F90 The currently selected Fortran 90 compiler. This variable is automatically updated when a new compiler environment is loaded.
$HOME Your home directory on the system.
$JAVA_HOME The directory containing the default installation of Java.
$KRB5_HOME The directory containing the Kerberos utilities.
$PET_HOME The directory containing the tools formerly installed and maintained by the PET staff. This variable is deprecated and will be removed from the system in the future. Certain tools will be migrated to $COST_HOME, as appropriate.
$PROJECTS_HOME A common directory where group-owned and supported applications and codes may be maintained for use by members of a group. Any project may request a group directory under $PROJECTS_HOME.
$SAMPLES_HOME The Sample Code Repository. This is a collection of sample scripts and codes provided and maintained by our staff to help users learn to write their own scripts. There are a number of ready-to-use scripts for a variety of applications.
$WORKDIR Your work directory on the local temporary file system (i.e., local high-speed disk).
4.3.2. Batch-Only Environment Variables

In addition to the variables listed above, the following variables are automatically set only in your batch environment. That is, your batch scripts will be able to see them when they run. These variables are supplied for your convenience and are intended for use inside your batch scripts.

Batch-Only Environment Variables
Variable Description
$BC_CORES_PER_NODE The number of cores per node for the compute node on which a job is running.
$BC_MEM_PER_NODE The approximate maximum user-accessible memory per node (in integer MB) for the compute node on which a job is running.
$BC_MPI_TASKS_ALLOC The number of MPI tasks allocated for a job.
$BC_NODE_ALLOC The number of nodes allocated for a job.
$JOBDIR Job-specific directory in $WORKDIR immune to scrubbing while job is active.

4.4. Modules

Software modules are a convenient way to set needed environment variables and include necessary directories in your path so commands for particular applications can be found. Warhawk uses "modules" to initialize your environment with COTS application software, system commands and libraries, compiler suites, environment variables, and PBS batch system commands.

A number of modules are loaded automatically as soon as you log in. To see the modules that are currently loaded, use the "module list" command. To see the entire list of available modules, use "module avail". You can modify the configuration of your environment by loading and unloading modules. For complete information on how to do this, see the Modules User Guide.

4.5. Archive Usage

All of our HPC systems share an online Mass Storage Archival system (MSAS) with 100 TB of Tier 1 archival storage (disk cache) and 6 PB of Tier 2 high-speed archival storage utilizing a robotic tape library. Every user is given an account on the MSAS.

Kerberized login and ftp are allowed into the MSAS system. Locally developed utilities may be used to transfer files to and from the MSAS as well as to create and delete directories, rename files, and list directory contents. For convenience, the environment variable $ARCHIVE_HOME can be used to reference your MSAS archive directory when using archive commands.

4.5.1. Archival Command Synopsis

A synopsis of the main archival utilities is listed below. For information on additional capabilities, see the Archive User Guide or read the online man pages available on each system. These non-Kerberized commands can be used either on the login nodes or in a batch submission script in the transfer queue, if desired.

Copy one or more files from the MSAS

archive get [-C path] [-s] file1 [file2...]

List files and directory contents on the MSAS

archive ls [lsopts] [file/dir ...]

Create directories on the MSAS

archive mkdir [-C path] [-m mode] [-p] [-s] dir1 [dir2 ...]

Copy one or more files to the MSAS

archive put [-C path] [-D] [-s] file1 [file2 ...]

5. Program Development

5.1. Programming Models

Warhawk supports two parallel programming models: Message Passing Interface (MPI) and Open Multi-Processing (OpenMP). A Hybrid MPI/OpenMP programming model is also supported. MPI is an example of the message- or data-passing models, while OpenMP uses only shared memory on a node by spawning threads. And, the hybrid model combines both models.

5.1.1. Message Passing Interface (MPI)

Warhawk has MPI libraries from Cray and HPE SGI. Cray MPICH and HPE SGI's Message Passing Toolkit (MPT) support the MPI 3.0 standard, as documented by the MPI Forum. The MPI is part of the software support for parallel programming across a network of computer systems through a technique known as message passing. MPI establishes a practical, portable, efficient, and flexible standard for message passing that makes use of the most attractive features of a number of existing message-passing systems, rather than selecting one of them and adopting it as the standard. See "man intro_mpi" for additional information.

When creating an MPI program on Warhawk, ensure the following:

  • That either a Programming Environment (module PrgEnv-[cray, intel, gnu, nvidia, aocc]) or the MPI Message Passing Toolkit (module mpt or module hmpt) has been loaded. To check this, run the "module list" command. If neither module is listed, use one of the following commands:

    module load PrgEnv-type
    where type is cray, intel, gnu, nvidia, or aocc.

    or

    module load MPT
    where MPT is mpt or hmpt.
  • That the source code includes one of the following lines :

    INCLUDE "mpif.h" ## for Fortran

    or

    #include <mpi.h> ## for C/C++

Using the Cray MPICH Library

To compile an MPI program, use the following examples.

For C Codes:

cc -o mpi_program mpi_program.c

For Fortran Codes:

ftn -o mpi_program mpi_program.f

For C++ Codes:

CC -o mpi_program mpi_program.cpp

To run an MPI program within a batch script, use the following command:

mpiexec -n mpi_procs mpi_program [user_arguments]
or
aprun -n mpi_procs mpi_program [user_arguments]

where mpi_procs is the number of MPI processes being started. Although the commands mpiexec(1) and aprun(1) have some similar options, they are not interchangeable. In the following examples, mpiexec launches the parallel processes, but aprun would have the same syntax.
For example:

#### Starts 256 MPI processes; 128 on each node, one per core
## request 2 nodes, each with 128 cores and 128 processes per node
#PBS -l select=2:ncpus=128:mpiprocs=128
mpiexec -n 256 ./a.out

The mpiexec command launches executables across a set of compute nodes allocated to your job and, by default, utilizes all cores and nodes available to your job. When each member of the parallel application has exited, mpiexec exits.

A common concern for MPI users is the need for more memory for each process. By default, one MPI process is started on each core of a node. This means that on Warhawk, the available memory on the node is split 128 ways. To allow an individual process to use more of the node's memory, you need to start fewer processes on that node. To do this, you must request more nodes from PBS but only run on a certain number of them. For example, the following select statement requests 4 nodes with 128 cores per node, but it only uses 12 of those cores for MPI processes on each node:

#### Starts 128 MPI processes; only 32 on each node
## request 4 nodes, each with 128 cores and 32 processes per node
#PBS -l select=4:ncpus=128:mpiprocs=32
mpiexec -n 128 ./a.out

For more information about mpiexec or aprun, type "man mpiexec" or "man aprun".

Using the HPE SGI MPI Library

To compile an MPI program, use the following examples.

Ensure the necessary compiler module is loaded:

module load intel  ## Intel compilers
module load gcc    ## GCC compilers
module load cce    ## Cray compilers
module load nvidia ## NVIDIA CUDA compilers
module load aocc   ## AMD Optimizing C/C++ compilers (AOCC)

For C Codes:

mpicc -cc=icc -o mpi_program mpi_program.c    ## Intel
mpicc -cc=gcc -o mpi_program mpi_program.c    ## GNU
mpicc -cc=craycc -o mpi_program mpi_program.c ## Cray
mpicc -cc=nvc -o mpi_program mpi_program.c    ## CUDA
mpicc -cc=clang -o mpi_program mpi_program.c  ## AOCC

For C++ Codes:

mpicxx -cxx=icpc -o mpi_program mpi_program.c   ## Intel
mpicxx -cxx=g++ -o mpi_program mpi_program.c    ## GNU
mpicxx -cxx=crayCC -o mpi_program mpi_program.c ## Cray
mpicxx -cxx=nvc++ -o mpi_program mpi_program.c  ## CUDA
mpicxx -cxx=clang -o mpi_program mpi_program.c  ## AOCC

For Fortran Codes:

mpif90 -f90=ifort -o mpi_program mpi_program.f     ## Intel
mpif90 -f90=gfortran -o mpi_program mpi_program.f  ## GNU
mpif90 -f90=crayftn -o mpi_program mpi_program.f90 ## Cray
mpif90 -f90=nvfortran -o mpi_program mpi_program.f ## CUDA

To run an MPI program within a batch script, use the following commands (assumes Intel compiler suite):

module unload PrgEnv-cray
module load mpt
module swap cray-pals cray-pals
mpiexec -n mpi_procs mpi_program [user_arguments]
where mpi_procs is the number of MPI processes being started.

DO NOT USE mpiexec_mpt (or mpirun, which links to mpiexec_mpt) on Warhawk with HPE SGI MPT.
For example:

#### Starts 256 MPI processes; 128 on each node, one per core
## request 2 nodes, each with 128 cores and 128 processes per node
#PBS -l select=2:ncpus=128:mpiprocs=128
module swap cray-pals cray-pals
mpiexec -n 256 ./a.out

The mpiexec command launches executables across a set of compute nodes allocated to your job and, by default, utilizes all cores and nodes available to your job. When each member of the parallel application has exited, mpiexec exits.

A common concern for MPI users is the need for more memory for each process. By default, one MPI process is started on each core of a node. This means that on Warhawk, the available memory on the node is split 128 ways. To allow an individual process to use more of the node's memory, you need to start fewer processes on that node. To do this, you must request more nodes from PBS but only run on a certain number of them. For example, the following select statement requests 4 nodes with 128 cores per node, but it only uses 32 of those cores for MPI processes:

#### Starts 128 MPI processes; only 32 on each node
## request 4 nodes, each with 128 cores and 32 processes per node
#PBS -l select=4:ncpus=128:mpiprocs=32
module swap cray-pals cray-pals
mpiexec -n 256 ./a.out

For more information about mpiexec or aprun, type "man mpiexec" or "man aprun".

5.1.2. Open Multi-Processing (OpenMP)

OpenMP is a portable, scalable model that gives programmers a simple and flexible interface for developing parallel applications. It supports shared-memory multiprocessing programming in C, C++, and Fortran and consists of a set of compiler directives, library routines, and environment variables that influence compilation and run-time behavior.

When creating an OpenMP program on Warhawk, ensure the following:

  • If using OpenMP functions (e.g., omp_get_wtime), that the source code includes one of the following lines:

    INCLUDE 'omp.h' ## for Fortran

    or

    #include <omp.h> ## for C/C++

    Or, if the code is written in Fortran 90 or later, the following line may be used instead:

    USE omp_lib
  • That the compile command includes an option to reference the OpenMP library. The Intel and GNU compilers support OpenMP, and each uses a different option.

To compile an OpenMP program, use the following examples.

Ensure the necessary compiler module is loaded:

module load intel ## Intel compilers
module load gcc   ## GCC compilers

For C codes:

craycc -fopenmp -o OpenMP_program OpenMP_program.c ## Cray
icc -qopenmp -o OpenMP_program OpenMP_program.c    ## Intel
gcc -fopenmp -o OpenMP_program OpenMP_program.c    ## GNU

For C++ codes:

crayCC -fopenmp -o OpenMP_program OpenMP_program.c ## Cray
icpc -qopenmp -o OpenMP_program OpenMP_program.c   ## Intel
g++ -fopenmp -o OpenMP_program OpenMP_program.c    ## GNU

For Fortran codes:

crayftn -homp -o OpenMP_program OpenMP_program.f     ## Cray
ifort -qopenmp -o OpenMP_program OpenMP_program.f    ## Intel
gfortran -fopenmp -o OpenMP_program OpenMP_program.f ## GNU

See section 5.2 for additional information on available compilers.

When running OpenMP applications, the $OMP_NUM_THREADS environment variable determines the number of threads. If there is no explicit assignment within the job script, $OMP_NUM_THREADS is assigned the value of the resource ompthreads in PBS resource select . If neither of assignments exist, $OMP_NUM_THREADS is assigned the value of the resource ncpus in PBS resource select .For example:

export OMP_NUM_THREADS=128
./OpenMP_program [user_arguments]

In the example above, the application starts the OpenMP_program on one node and spawns a total of 128 threads. Since Warhawk has 128 cores per compute node, this yields 1 thread per core.

5.1.3. Hybrid Processing (MPI/OpenMP)

An application built with the hybrid model of parallel programming can run on Warhawk using both OpenMP and MPI.

When creating a hybrid (MPI/OpenMP) program on Warhawk, follow the instructions in the MPI and OpenMP sections above for creating your program. Then use the compilation instructions for OpenMP.

To run a hybrid program within a batch script, set $OMP_NUM_THREADS equal to the number of threads in the team. Then launch your program using mpiexec as follows:

####  MPI/OpenMP on 4 nodes, 8 MPI processes total with 16 threads each
## request 4 nodes, each with 128 cores and 2 processes per node
#PBS -l select=4:ncpus=128:mpiprocs=2:ompthreads=16
## assign 8 MPI processes with 2 MPI processes per node
export OMP_NUM_THREADS=16
[Necessary module commands] mpiexec -n 8 ./mpi_program

5.2. Available Compilers

Warhawk has five programming environment suites.

  • Cray
  • Intel
  • GNU
  • NVIDIA CUDA
  • AMD Optimizing C/C++ compilers (AOCC)

Warhawk has two MPI suites.

  • Cray MPICH
  • SGI MPT

All versions of MPI share a common base set of compilers available on both the login and compute nodes.

Common Compiler Commands
Language Cray Intel GNU AOCC NVIDIA Serial/Parallel
C craycc icc gcc clang nvc Serial
C++ craygCC icpc g++ clang nvc++ Serial
Fortran 77 crayftn ifort gfortran n/a nvfortran Serial
Fortran 90/95 crayftn ifort gfortran n/a nvfortran Serial

Cray MPICH codes are built using the following commands with the underlying compiler command determined by the loaded modulefile PrgEnv-*.

Cray Programming Environment Compiler Commands
Language Cray Intel GNU AOCC NVIDIA Serial/Parallel
C cc cc cc cc cc Serial/Parallel
C++ CC CC CC CC CC Serial/Parallel
Fortran77 ftn ftn ftn ftn ftn Serial/Parallel
Fortran 90/95 ftn ftn ftn ftn ftn Serial/Parallel

SGI MPT codes are built using the following compiler wrappers with the underlying compiler command designated with a command-line option or environment variable.

Intel MPI Compiler Wrapper Scripts
Language Cray Intel GNU NVIDIA AOCC Serial/Parallel
MPI C mpicc mpicc mpicc mpicc mpicc Parallel
MPI C++ mpicxx mpicxx mpicxx mpicxx mpicxx Parallel
MPIf77 mpif77 mpif77 mpif77 mpif77 mpif77 Parallel
MPI f90 mpif90 mpif90 mpif90 mpif90 mpif90 Parallel
5.2.1. Cray Compiler Environment

The following table lists some of the more common options you can use.

Cray Compiler Options
Option Purpose
-c Generate intermediate object file but do not attempt to link.
-I directory Search in directory for include or module files.
-L directory Search in directory for libraries.
-o outfile Name executable "outfile" rather than the default "a.out".
-Olevel Set the optimization level. For more information on optimization, see the section on Profiling and Optimization.
-f free Process Fortran codes using free form.
-fpic or -fPIC Generate position-independent code for shared libraries.
-g Generate symbolic debug information.
-m0 Reports detailed information about code optimizations to stdout as compile proceeds.
-f openmp Recognize OpenMP directives (C/C++).
-homp Recognize OpenMP directives (Fortran).
-hdynamic Compiling using shared objects.
-K traps=fp Trap floating point, divide by zero, and overflow exceptions.

Detailed information about these and other compiler options is available in the Cray compiler (craycc, crayCC, and crayftn) man pages on Warhawk.

5.2.2. Intel Compiler Environment

The following table lists some of the more common options you can use.

Intel Compiler Options
Option Purpose
-c Generate intermediate object file but do not attempt to link.
-I directory Search in directory for include or module files.
-L directory Search in directory for libraries.
-o outfile Name executable "outfile" rather than the default "a.out".
-Olevel Set the optimization level. For more information on optimization, see the section on Profiling and Optimization.
-free Process Fortran codes using free form.
-fpic or -fPIC Generate position-independent code for shared libraries.
-convert big_endian Big-endian files; the default is for little-endian.
-g Generate symbolic debug information.
-Minfo=all Reports detailed information about code optimizations to stdout as compile proceeds.
-openmp Recognize OpenMP directives.
-Bdynamic Compiling using shared objects.
-fpe-all=0 Trap floating point, divide by zero, and overflow exceptions.

Detailed information about these and other compiler options is available in the Intel compiler (ifort, icc, and icpc) man pages on Warhawk.

5.2.3. GNU Compiler Collection

The GNU Programming Environment provides a large number of options that are the same for all compilers in the suite. The following table lists some of the more common options you can use.

GNU Compiler Options
Option Purpose
-c Generate intermediate object file but do not attempt to link.
-I directory Search in directory for include or module files.
-L directory Search in directory for libraries.
-o outfile Name executable "outfile" rather than the default "a.out".
-Olevel Set the optimization level. For more information on optimization, see the section on Profiling and Optimization.
-g Generate symbolic debug information.
-Bstatic Causes executable to link to all libraries statically.
-fconvert=big-endian Big-endian files; the default is for little-endian.
-Wextra
-Wall
Turns on increased error reporting.

Detailed information about these and other compiler options is available in the GNU compiler (gfortran, gcc, and g++) man pages on Warhawk.

5.3. Relevant Modules

By default, Warhawk loads the modulefiles PrgEnv-cray (Cray compiler suite, Cray MPICH, Cray LibSci, Parallel Application Launch Services [PALS]) for you. For more information on using modules, see the Modules User Guide.

5.4. Libraries

5.4.1. Cray LibSci

Warhawk provides the Cray LibSci library, a collection of numerical routines tuned for Cray computers. The routines, which are available via both FORTRAN and C interfaces, include:

  • BLAS (Levels 1, 2, and 3)
  • CBLAS – a C interface to legacy BLAS
  • LAPACK
  • BLACS
  • ScaLAPACK plus PBLAS (Levels 1, 2, and 3)
  • IRT (Iterative Refinement Toolkit) - a library of solvers and tools that provides solutions to linear systems

To link to the Cray LibSci libraries, add the entry

-lsci_compiler[_mpi][_mp]

where compiler can be cray, intel, gnu, nvidia, or aocc, and, optionally, _mpi and _mp select MPI-enabled and multithreaded versions, respectively.

5.4.2. Other Cray-supplied Libraries

The following Cray-optimized libraries are also available:

  • FFTW – Discrete Fourier Transform libraries
  • HDF5 – Hierarchical Data Format library (serial and parallel)
  • NETCDF – Network Common Data Format library (serial and parallel)

The modulefiles for these libraries are of the form cray-library_name. Use

module avail cray-

to find the appropriate modulefile. More information about linking these libraries are in the documentation which is available after loading the associated modulefiles.

5.4.3. Additional Math Libraries

There is also an extensive set of Math libraries available in the $PET_HOME/MATH directory on Warhawk. Information about these libraries can be found on the Baseline Configuration website at BC policy FY13-01.

5.5. Debuggers

Warhawk provides the GNU Project Debugger (gdb), TotalView, and DDT debuggers to assist users in debugging their code.

5.5.1. GDB

The GNU Project Debugger (gdb) is a source-level debugger that can be invoked either with a program for execution or a running process id. To launch your program under gdb for debugging, use the following command:

gdb a.out corefile

To attach gdb to a program that is already executing on a node, use the following command:

gdb a.out pid

For more information, the GDB manual can be found at http://www.gnu.org/software/gdb.

5.5.2. TotalView

TotalView is a debugger that supports threads, MPI, OpenMP, C/C++, and Fortran, mixed-language codes, advanced features like on-demand memory leak detection, other heap allocation debugging features, and the Standard Template Library Viewer (STLView). Unique features like dive, a wide variety of breakpoints, the Message Queue Graph/Visualizer, powerful data analysis, and control at the thread level are also available.

Follow the steps below to use TotalView on Warhawk via a UNIX X-Windows interface.

  1. Ensure an X server is running on your local system. Linux users will likely have this by default, but MS Windows users will need to install a third party X Windows solution. There are various options available.
  2. For Linux users, connect to Warhawk using "ssh -Y". Windows users will need to use PuTTY with X11 forwarding enabled (Connection->SSH->X11->Enable X11 forwarding).
  3. Compile your program on Warhawk with the "-g" option.
  4. Submit an interactive job:

    qsub -l select=1:ncpus=128:mpiprocs=128 -A Project_ID \
    -l walltime=00:30:00 -q debug -X -I

    Once your job has been scheduled, you will be logged into an interactive batch session on a service node that is shared with other users.
  5. Load the TotalView module:

    module load totalview

  6. Start program execution:

    mpiexec -tv -np 4 ./my_mpi_prog.exe arg1 arg2...

  7. After a short delay, the TotalView windows will pop up. Click "Go" and then "Yes" to start program execution.

An example of using TotalView can be found in $SAMPLES_HOME/Programming/Totalview_Example on Warhawk. For more information on using TotalView, see the TotalView Documentation page.

5.5.3. DDT

DDT is a debugger that supports threads, MPI, OpenMP, C/C++, Fortran, Co-array Fortran, UPC, and CUDA. Memory debugging and data visualization are supported for large-scale parallel applications. The Parallel Stack Viewer is a unique way to see the program state of all processes and threads at a glance.

To use DDT on Warhawk, follow steps 1 through 4 (above) as for TotalView but load and use the DDT debugger instead.

  1. Load the DDT module:

    module load ddt
  2. Start program execution:

    ddt -n 4 ./my_mpi_prog.exe arg1 arg2 ...
  3. The DDT window will pop up. Verify the application name and number of MPI processes. Click "Run".

An example of using DDT can be found in $SAMPLES_HOME/Programming/DDT_Example on Warhawk.

5.6. Code Profiling and Optimization

Profiling is the process of analyzing the execution flow and characteristics of your program to identify sections of code that are likely candidates for optimization, which increases the performance of a program by modifying certain aspects for increased efficiency.

We provide two profiling tools: gprof and codecov to assist you in the profiling process. In addition, a basic overview of optimization methods with information about how they may improve the performance of your code can be found in Performance Optimization Methods (below).

5.6.1. gprof

The GNU Project Profiler (gprof) is a profiler that shows how your program is spending its time and which function calls are made. To profile code using gprof, use the "-pg" option during compilation.

5.6.2. Additional Profiling Tools

There is also a set of profiling tools available in the $PET_HOME/pkgs directory on Warhawk. Information about these tools may be found on the Baseline Configuration website at BC policy FY13-01.

5.6.3. Program Development Reminders

If an application is not programmed for distributed memory, then only the cores on a single node can be used. This is limited to 128 cores on Warhawk.

Keep the system architecture in mind during code development. For instance, if your program requires more memory than is available on a single node, then you will need to parallelize your code so it can function across multiple nodes.

5.6.4. Compiler Optimization Options

The "-Olevel" option enables code optimization when compiling. The level you choose (0-4) will determine how aggressive the optimization will be. Increasing levels of optimization may increase performance significantly, but a loss of precision may also occur. There are also additional options that may enable further optimizations. The following table contains the most commonly used options.

Compiler Optimization Options
Option Description Compiler Suite
-O0 No Optimization. (default in GNU) All
-O1 Scheduling within extended basic blocks is performed. Some register allocation is performed. No global optimization. All
-O2 Level 1 plus traditional scalar optimizations such as induction recognition and loop invariant motion are performed by the global optimizer. Generally safe and beneficial. (default in Cray, GNU, and Intel) All
-O3 Levels 1 and 2 plus more aggressive code hoisting and scalar replacement optimizations that may or may not be profitable. Generally beneficial. All
-fipa-* The GNU compilers automatically enable IPA at various -O levels. To set these manually, see the options beginning with -fipa in the gcc man page. GNU
-finline-functions Enables function inlining within a single file Intel
-ipon Enables interprocedural optimization between files and produces up to n object files Intel
-inline-level=n Number of levels of inlining (default: n=2) Intel
-opt-reportn Generate optimization report with n levels of detail Intel
-xHost Compiler generates code with the highest instruction set available on the processor. Intel
5.6.5. Performance Optimization Methods

Optimization generally increases compilation time and executable size and may make debugging difficult. However, it usually produces code that runs significantly faster. The optimizations you can use will vary depending on your code and the system on which you are running.

Note: Before considering optimization, you should always ensure your code runs correctly and produces valid output.

In general, there are four main categories of optimization:

  • Global Optimization
  • Loop Optimization
  • Interprocedural Analysis and Optimization(IPA)
  • Function Inlining
Global Optimization

A technique that looks at the program as a whole and may perform any of the following actions:

  • Perform on code over all its basic blocks
  • Perform control-flow and data-flow analysis for an entire program
  • Detect all loops, including those formed by IF and GOTOs statements and perform general optimization
  • Constant propagation
  • Copy propagation
  • Dead store elimination
  • Global register allocation
  • Invariant code motion
  • Induction variable elimination
Loop Optimization

A technique that focuses on loops (for, while, etc.,) in your code and looks for ways to reduce loop iterations or parallelize the loop operations. The following types of actions may be performed:

  • Vectorization - rewrites loops to improve memory access performance. Some compilers may also support automatic loop vectorization by converting loops to utilize low-level hardware instructions and registers if they meet certain criteria.
  • Loop unrolling - (also known as "unwinding") replicates the body of loops to reduce loop branching overhead and provide better opportunities for local optimization.
  • Parallelization - divides loop operations over multiple processors where possible.
Interprocedural Analysis and Optimization (IPA)

A technique that allows the use of information across function call boundaries to perform optimizations that would otherwise be unavailable.

Function Inlining

A technique that seeks to reduce function call and return overhead. It:

  • Is used with functions that are called numerous times from relatively few locations.
  • Allows a function call to be replaced by a copy of the body of that function.
  • May create opportunities for other types of optimization.
  • May not be beneficial. Improper use may increase code size and actually result in less efficient code.

6. Batch Scheduling

6.1. Scheduler

The Portable Batch System (PBS) is currently running on Warhawk. It schedules jobs, manages resources and job queues, and can be accessed through the interactive batch environment or by submitting a batch request. PBS is able to manage both single-processor and multiprocessor jobs. The PBS module is automatically loaded for you when you log in.

6.2. Queue Information

The following table describes the PBS queues available on Warhawk:

Queue Descriptions and Limits on Warhawk
Priority Queue Name Max Wall Clock Time Max Cores Per Job Description
Highest urgent 168 Hours 69,888 Jobs belonging to DoD HPCMP Urgent Projects
Down arrow for decreasing priority debug 1 Hour 2,816 User testing
high 168 Hours 69,888 Jobs belonging to DoD HPCMP High Priority Projects
frontier 168 Hours 69,888 Jobs belonging to DoD HPCMP Frontier Projects
standard 168 Hours 69,888 Standard jobs
HIE 24 Hours 256 Rapid response for interactive work
transfer 48 Hours 1 Data transfer for user jobs
Lowest background 120 Hours 2,816 Unrestricted access - no allocation charge

6.3. Interactive Logins

When you log into Warhawk, you will be running in an interactive shell on a login node. The login nodes provide login access for Warhawk and support such activities as compiling, editing, and general interactive use by all users. Please note the Login Node Abuse Policy. The preferred method to run resource-intensive executions is to use an interactive batch session.

6.4. Interactive Batch Sessions

To use the interactive batch environment, you must first acquire an interactive batch shell. This is done by executing a qsub command with the "-I" option from within the interactive environment. For example:

qsub -l select=N1:ncpus=NUM:mpiprocs=N2 -A Project_ID -q queue_name -l walltime=HHH:MM:SS -I

You must specify the desired maximum walltime. The number of nodes requested (N1) defaults to 1. The number of cores per node (NUM) defaults to 128. The number of processes per node (N2) defaults to NUM. The project ID (Project_ID) defaults to the environment variable $ACCOUNT. The job queue defaults to standard. Valid values for NUM and N2 are between 1 and 128.

Your interactive batch sessions will be scheduled just as normal batch jobs are scheduled depending on the other queued batch jobs, so it may take quite a while. Once your interactive batch shell starts, you can run or debug interactive applications, post-process data, etc.

At this point, you can launch parallel applications on your assigned set of compute nodes by using the mpiexec_mpt command. You can also run interactive commands or scripts on this node.

6.5. Batch Request Submission

PBS batch jobs are submitted via the qsub command. The format of this command is:

qsub [ options ] batch_script_file

qsub options may be specified on the command line or embedded in the batch script file by lines beginning with "#PBS".

For a more thorough discussion of PBS batch submission on Warhawk, see the Warhawk PBS Guide.

6.6. Batch Resource Directives

Batch resource directives allow you to specify to PBS how your batch jobs should be run and the resources your job requires. Although PBS has many directives, you only need to know a few to run most jobs.

The basic syntax of PBS directives is as follows:

#PBS option[[=]value]

where some options may require values to be included. For example, to start a 64-process job, you would request one node of 128 cores and specify you will be running 64 processes per node:

#PBS -l select=1:ncpus=128:mpiprocs=64

The following directives are required for all jobs:

Required PBS Directives
Directive Value Description
-A Project_ID Name of the project
(uses $ACCOUNT on AFRL DSRC systems if not specified)
-q queue_name Name of the queue
-l ncpus=# Number of cores
-l walltime=HHH:MM:SS Maximum wall time
Optional Directives
Directive Value Description
-N Job Name Name of the job.
-e File name Redirect standard error to the name file.
-o File name Redirect standard output to the name file.
-j oe Merge standard error and standard output into standard output.
-l application application_name Identify the application being used.
-I Request an interactive batch shell.
-V Export all environment variables to the job.
-v Variable list Export specific environment variables to the job.

A more complete listing of batch resource directives is available in the Warhawk PBS Guide.

6.7. Launch Commands

There are different commands for launching MPI executables from within a batch job depending on which MPI implementation your script uses.

To launch a Cray MPICH executable, use the mpiexec command as follows:

mpiexec -n #_of_MPI_tasks ./mpijob.exe

To launch an SGI MPT executable, use the mpiexec command as follows:

module swap cray-pals cray-pals
mpiexec -n #_of_MPI_tasks ./mpijob.exe

For OpenMP executables, no launch command is needed.

6.8. Sample Scripts

While it is possible to include all PBS directives at the qsub command line, the preferred method is to embed the PBS directives within the batch request script using "#PBS". The following script is a basic example and contains all of the required directives, some frequently used optional directives, and common script components. It starts 256 processes on 2 nodes of 128 cores each, with one MPI process per core. More thorough examples are available in the Warhawk PBS Guide and in the Sample Code Repository ($SAMPLES_HOME) on Warhawk.

The following example is a good starting template for a batch script to run a serial job for one hour:

#!/bin/bash ## Specify your shell
#
# Specify name of the job
#PBS -N serialjob
#
# Append std output to file serialjob.out 
#PBS -o serialjob.out
#
# Append std error to file serialjob.err
#PBS -e serialjob.err
#
# Specify Project ID to be charged (Required)
#PBS -A Project_ID
#
# Request wall clock time of 1 hour (Required)
#PBS -l walltime=01:00:00
#
# Specify queue name (Required)
#PBS -q standard
#
# Specify the number cores (Required)
#PBS -l select=1:ncpus=1
#
#PBS -S /bin/bash
# Change to the specified directory
cd $WORKDIR
#
# Execute the serial executable on 1 core
./serial_fort.exe
# End of batch job

The first few lines tell PBS to save the standard output and error output to the given files, and to give the job a name. Skipping ahead, we estimate the run-time to be about one hour and know that this is acceptable for the standard batch queue. We need one core in total, so we request one core. The resource allocation is one 128-core node for exclusive use by the job.

Important! Except for the transfer queue, which uses shared nodes, resource requests and charging are for exclusive 128-core nodes.

The following example is a good starting template for a batch script to run a parallel (MPI) job for two hours:

#!/bin/bash
## The first line (above) specifies the shell to use for parsing 
## the remaining lines of the batch script.
#
## Required PBS Directives --------------------------------------
#PBS -A Project_ID
#PBS -q standard
#PBS -l select=2:ncpus=128:mpiprocs=128
#PBS -l walltime=02:00:00
#
## Optional PBS Directives --------------------------------------
#PBS -N Test_Run_1
#PBS -j oe
#PBS -V
#PBS -S /bin/bash
#
## Execution Block ----------------------------------------------
# Environment Setup
# Get sequence number of job identifier
JOBID=`echo $PBS_JOBID | cut -d '.' -f 1`
# cd to job-specific directory in your personal directory in
# the scratch file system ($WORKDIR/$PBS_JOBID)
cd $JOBDIR
#
# Launching
# copy executable from $HOME and submit it
cp $HOME/mympiprog.exe .
mpiexec -n 256 ./mympiprog.exe > mympiprog.out
#
# Clean up
# archive your results
# Using the "here document" syntax, create a job script
# for archiving your data.
cd $WORKDIR
rm -f archive_job
cat > archive_job << END
#!/bin/bash
#PBS -l walltime=06:00:00
#PBS -q transfer
#PBS -A Project_ID
#PBS -l select=1:ncpus=1
#PBS -j oe
#PBS -S /bin/bash
cd $JOBDIR
mkdir $ARCHIVE_HOME/$JOBID
cp -r ./* $ARCHIVE_HOME/$JOBID
ls -l $ARCHIVE_HOME/$JOBID
END
#
# Submit the archive job script.
qsub archive_job
# End of batch job

The first few lines tell PBS to save the standard output and error output to the given files, and to give the job a name. Skipping ahead, we estimate the run time to be about 2 hours and know this is acceptable for the standard batch queue. The next couple of lines set the total number of cores and the number of cores per node for the job. This job is requesting 256 total cores and 128 cores per node allowing the job to run on 2 nodes. The default value for number of cores per node is 128.

Additional examples are available in the Warhawk PBS Guide and in the Sample Code Repository ($SAMPLES_HOME) on Warhawk.

6.9. PBS Commands

The following commands provide the basic functionality for using the PBS batch system:

qsub: Used to submit jobs for batch processing.

qsub [qsub_options] my_job_script

qstat: Used to check the status of submitted jobs.

qstat PBS_JOBID        ##check one job
qstat -u my_user_name  ##check all of user's jobs

qdel: Used to kill queued or running jobs.

qdel PBS_JOBID
qdel -Wforce PBS_JOBID ##Remove a job if one or more of it's allocated nodes is down

A more complete list of PBS commands is available in the Warhawk PBS Guide.

6.10. Determining Time Remaining in a Batch Job

In batch jobs, knowing the time remaining before the workload management system will kill the job enables the user to write restart files or even prepare input for the next job submission. However, adding such capability to an existing source code requires knowledge to query the workload management system as well as parsing the resulting output to determine the amount of remaining time.

The DoD HPCMP allocated systems now have the library, WLM_TIME, as an easy way to provide the remaining time in the batch job to C, C++, and Fortran programs. The library can be added to your job using the following:

For C:

#include <wlm_time.h>
void wlm_time_left(long int *seconds_left)

For Fortran:

SUBROUTINE WLM_TIME_LEFT(seconds_left)
INTEGER seconds_left

For C++:

extern "C" {
#include <wlm_time.h>
}
void wlm_time_left(long int *seconds_left)

For simplicity, wall-clock-time remaining is returned as an integer value of seconds.

To simplify usage, a module file defines the process environment, and a pkg-config metadata file defines the necessary compiler linker options:

For C:

module load wlm_time
$(CC) ctest.c `pkg-config --cflags --libs wlm_time`

For Fortran:

module load wlm_time
$(F90) test.f90 `pkg-config --cflags-only-I --libs wlm_time`

For C++:

module load wlm_time
$(CXX) Ctest.C `pkg-config --cflags --libs wlm_time`

WLM_TIME works currently with PBS. The developers expect that WLM_TIME will continue to provide a uniform interface encapsulating the underlying aspects of the workload management system.

6.11. Advance Reservations

A subset of Warhawk's nodes has been set aside for use as part of the Advance Reservation Service (ARS). The ARS allows users to reserve a user-designated number of nodes for a specified number of hours starting at a specific date/time. This service enables users to execute interactive or other time-critical jobs within the batch system environment. The ARS is accessible via most modern web browsers at https://reservation.hpc.mil. Authenticated access is required. The ARS User Guide is available on HPC Centers.

7. Software Resources

7.1. Application Software

A complete listing with installed versions can be found on our software page. The general rule for all COTS software packages is that the two latest versions will be maintained on our systems. For convenience, modules are also available for most COTS software packages.

7.2. Useful Utilities

The following utilities are available on Warhawk. For command-line syntax and examples of usage, please see each utility's online man page.

Baseline Configuration Commands and Tools
NameDescription
archive Perform basic file-handling operations on the archive system
bcmodule An enhanced version of the standard module command
check_license Check the status of licenses for HPCMP shared applications
cqstat Display information about running and pending batch jobs
mpscp High-performance remote file copy
node_use Display the amount of free and used memory for login nodes
qflag Report a problem with a batch job to the HPCMP Help Desk
qhist Print tracing information for a batch job
qpeek Display spooled stdout and stderr for an executing batch job.
qview Display information about batch jobs and queues
scampi Transfer data between systems using multiple streams and sockets
show_queues Report current batch queue status, usage, and limits
show_storage Display disk/file usage and quota information
show_usage Display CPU allocation and usage by subproject
tube Copy files to a remote system using Kerberos host authentication

7.3. Sample Code Repository

The Sample Code Repository is a directory that contains examples for COTS batch scripts, building and using serial and parallel programs, data management, and accessing and using serial and parallel math libraries. The $SAMPLES_HOME environment variable contains the path to this area and is automatically defined in your login environment.

8. Links to Vendor Documentation

HPE Home: https://www.hpe.com

SUSE Home: https://suse.com
SUSE Linux Enterprise Server: https://suse.com/products/server

GNU Home: http://www.gnu.org
GNU Compiler: http://gcc.gnu.org/onlinedocs

Intel Home: http://www.intel.com
Intel Documentation: http://software.intel.com/en-us/intel-software-technical-documentation
Intel Compiler List: http://software.intel.com/en-us/intel-compilers

TotalView Documentation: https://docs.roguewave.com/en/totalview/current/
DDT Tutorials: http://www.allinea.com/tutorials