HPE SGI 8600 (Mustang) User Guide
Table of Contents
- 1. Introduction
- 1.1. Document Scope and Assumptions
- 1.2. Policies to Review
- 1.3. Obtaining an Account
- 1.4. Requesting Assistance
- 2. System Configuration
- 2.1. System Summary
- 2.2. Processors
- 2.3. Memory
- 2.4. Operating System
- 2.5. File Systems
- 2.5.1. /p/home
- 2.5.2. /p/work1
- 2.5.3. /p/cwfs
- 2.6. Peak Performance
- 3. Accessing the System
- 3.1. Kerberos
- 3.2. Logging In
- 3.3. File Transfers
- 4. User Environment
- 4.1. User Directories
- 4.1.1. Home Directory
- 4.1.2. Work Directory
- 4.1.3. Center Directory
- 4.2. Shells
- 4.3. Environment Variables
- 4.3.1. Login Environment Variables
- 4.3.2. Batch-Only Environment Variables
- 4.4. Modules
- 4.5. Archive Usage
- 4.5.1. Archival Command Synopsis
- 5. Program Development
- 5.1. Programming Models
- 5.1.1. Message Passing Interface (MPI)
- 5.1.2. Open Multi-Processing (OpenMP)
- 5.1.3. Hybrid Processing (MPI/OpenMP)
- 5.2. Available Compilers
- 5.2.1. Intel Compiler Environment
- 5.2.2. Portland Group (PGI) Compiler Suite
- 5.2.3. GNU Compiler Collection
- 5.3. Relevant Modules
- 5.4. Libraries
- 5.4.1. Intel Math Kernel Library (Intel MKL)
- 5.4.2. Additional Math Libraries
- 5.5. Debuggers
- 5.5.1. GDB
- 5.5.2. DDT
- 5.6. Code Profiling and Optimization
- 5.6.1. gprof
- 5.6.2. Codecov
- 5.6.3. Additional Profiling Tools
- 5.6.4. Program Development Reminders
- 5.6.5. Compiler Optimization Options
- 5.6.6. Performance Optimization Methods
- 6. Batch Scheduling
- 6.1. Scheduler
- 6.2. Queue Information
- 6.3. Interactive Logins
- 6.4. Interactive Batch Sessions
- 6.5. Batch Request Submission
- 6.6. Batch Resource Directives
- 6.7. Launch Commands
- 6.8. Sample Scripts
- 6.9. PBS Commands
- 6.10. Determining Time Remaining in a Batch Job
- 6.11. Advance Reservations
- 7. Software Resources
- 7.1. Application Software
- 7.2. Useful Utilities
- 7.3. Sample Code Repository
- 8. Links to Vendor Documentation
- 8.1. HPE SGI Links
- 8.2. Red Hat Links
- 8.3. GNU Links
- 8.4. Portland Group (PGI) Links
- 8.5. Intel Links
1. Introduction
1.1. Document Scope and Assumptions
This document provides an overview and introduction to the use of the HPE SGI 8600, Mustang, located at the AFRL DSRC, along with a description of the specific computing environment on Mustang. The intent of this guide is to provide information that will enable the average user to perform computational tasks on the system. To receive the most benefit from the information provided here, you should be proficient in the following areas:
- Use of the UNIX operating system
- Use of an editor (e.g., vi or emacs)
- Remote usage of computer systems via network or modem access
- A selected programming language and its related tools and libraries
1.2. Policies to Review
All policies are discussed in the AFRL DSRC Introductory Site Guide. All users running at the AFRL DSRC are expected to know, understand, and follow the policies discussed. If you have any questions about AFRL DSRC's policies, please contact the HPC Help Desk.
1.3. Obtaining an Account
The process of getting an account on the HPC systems at any of the DSRCs begins with getting an account on the HPCMP Portal to the Information Environment, commonly called a "pIE User Account." If you do not yet have a pIE User Account, please visit HPC Centers: Obtaining an Account and follow the instructions there. If you need assistance with any part of this process, please contact the HPC Help Desk at accounts@helpdesk.hpc.mil.
1.4. Requesting Assistance
The HPC Help Desk is available to help users with unclassified problems, issues, or questions. Analysts are on duty 8:00 a.m. - 8:00 p.m. Eastern, Monday - Friday (excluding Federal holidays).
- Web: https://helpdesk.hpc.mil
- E-mail: help@helpdesk.hpc.mil
- Phone: 1-877-222-2039 or (937) 255-0679
For more detailed contact information, please see our Contact Page.
2. System Configuration
2.1. System Summary
Mustang is an HPE SGI 8600 system. The login and compute nodes are populated with Intel Xeon Platinum 8168 (Skylake) processors clocked at 2.7 GHz. Mustang uses the Intel Omni-Path interconnect in a Non-Blocking Fat Tree as its high-speed network for MPI messages and I/O traffic. Mustang uses Lustre to manage its parallel file system that targets the disk RAID arrays.
Mustang has 1176 compute nodes that share memory only on the node; memory is not shared across the nodes.
Each standard compute node has two 24-core processors (48 cores) sharing 192 GB of DDR4 memory, with no user-accessible swap space.
Each large-memory compute node has two 24-core processors (48 cores) sharing 768 GB of DDR4 memory, with no user-accessible swap space.
Each GPU compute node has two 24-core processors (48 cores) and one NVIDA Tesla P100 GPU with its own Red Hat Enterprise Linux operating system, sharing 384 GB of DDR4 memory, with no user-accessible swap space.
Mustang is rated at 4.87 peak PFLOPS and has 8.4 PB (formatted) of parallel disk storage.
Mustang is intended to be used as a batch-scheduled HPC system. Its login nodes are not to be used for large computational (e.g., memory, I/O, long executions) work. All executions that require large amounts of system resources must be sent to the compute nodes by batch job submission.
Login | Standard | Large-Memory | GPU | |
---|---|---|---|---|
Total Nodes | 12 | 1,128 | 24 | 24 |
Processor | Intel 8168 Skylake | Intel 8168 Skylake | Intel 8168 Skylake | Intel 8168 Skylake |
Processor Speed | 2.7 GHz | 2.7 GHz | 2.7 GHz | 2.7 GHz |
Sockets / Node | 2 | 2 | 2 | 2 |
Cores / Node | 48 | 48 | 48 | 48 |
Total CPU Cores | 576 | 54,144 | 1,152 | 1,152 |
Usable Memory / Node | 380 GB | 180 GB | 744 GB | 372 GB |
Accelerators / Node | None | None | None | 1 |
Accelerator | n/a | n/a | n/a | NVIDIA P100 PCIe 3 |
Memory / Accelerator | n/a | n/a | n/a | 16 GB |
Storage on Node | None | None | None | None |
Interconnect | Intel Omni-Path | Intel Omni-Path | Intel Omni-Path | Intel Omni-Path |
Operating System | RHEL | RHEL | RHEL | RHEL |
Path | Formatted Capacity | File System Type | Storage Type | User Quota | Minimum File Retention |
---|---|---|---|---|---|
/p/home ($HOME) | 691 TB | Lustre | HDD | 100 GB | None |
/p/work1 ($WORKDIR) | 8.4 PB | Lustre | HDD | None | 21 Days |
/p/cwfs ($CENTER) | 3.3 PB | GPFS | HDD | 100 TB | 120 Days |
/p/work1/projects ($PROJECTS_HOME) | 8.4 PB | Lustre | HDD | None | None |
2.2. Processors
Mustang uses the 2.7-GHz Intel Skylake Xeon processors (XSPF 8168) on its login, standard-memory and large-memory compute nodes. There are two processors per node with 24 cores, for a total of 48 cores per node. Each processor has a 33-MB L3 cache.
GPU nodes use the 2.7-GHz Intel Skylake Xeon processors (XSPF 8168). There are two processors per node with 24 cores, for a total of 48 cores per node. Each processor has a 33-MB L3 cache. Each GPU node has an NVIDIA Tesla P100 GPU with 3,584 cores.
2.3. Memory
Mustang uses both shared- and distributed-memory models. Memory is shared among all the cores on a node, but is not shared among the nodes across the cluster.
Each login node contains 384 GB of main memory. All memory and cores on the node are shared among all users who are logged in. Therefore, users should not use more than 8 GB of memory at any one time.
Each standard compute node contains 180 GB of user-accessible shared memory.
Each large-memory compute node contains 744 GB of user-accessible shared memory.
Each GPU node consists of a standard compute node, paired with an NVIDIA Tesla P100 GPU. The standard compute portion contains 372 GB of user-accessible shared memory that is exclusively available via the GPU, as well as approximately 16 GB of memory on the GPU, itself.
2.4. Operating System
Mustang's operating system is Red Hat Linux.
2.5. File Systems
Mustang has the following file systems available for user storage:
2.5.1. /p/home
This file system is locally mounted from Mustang's Lustre file system and has a formatted capacity of 691 TB. All users have a home directory located on this file system which can be referenced by the environment variable $HOME.
2.5.2. /p/work1
This file system is locally mounted from Mustang's Lustre file system and is tuned for parallel I/O. It has a formatted capacity of 8.4 PB. All users have a work directory located on this file system which can be referenced by the environment variable $WORKDIR. This file system is not backed up. Users are responsible for making backups of their files to the archive server or to some other local system.
Maintaining the high performance of the Lustre file system is important for the efficient and effective use of Mustang by all users. You should take steps to ensure your file storage and access methods follow the suggested guidelines as described in the Lustre User Guide.
2.5.3. /p/cwfs
This path is directed to the Center-Wide File System (CWFS) which is meant for short-term storage (no longer than 120 days). All users have a directory defined in this file system which can be referenced by the environment variable $CENTER. This is accessible from both the compute nodes and the HPC systems login nodes. The CWFS has a formatted capacity of 3.3 PB and is managed by IBM's Spectrum Scale (formerly GPFS).
2.6. Peak Performance
Mustang is rated at 4.87 peak PFLOPS.
3. Accessing the System
3.1. Kerberos
A Kerberos client kit must be installed on your desktop to enable you to get a Kerberos ticket. Kerberos is a network authentication tool that provides secure communication by using secret cryptographic keys. Only users with a valid HPCMP Kerberos authentication can gain access to Mustang. More information about installing Kerberos clients on your desktop can be found at HPC Centers: Kerberos & Authentication.
3.2. Logging In
The system host name for the Mustang cluster is mustang.afrl.hpc.mil, which will redirect the user to one of ten login nodes. Hostnames and IP addresses to these nodes are available upon request from the HPC Help Desk.
The preferred way to login to Mustang is via ssh, as follows:
% ssh -l username mustang.afrl.hpc.mil
3.3. File Transfers
File transfers to DSRC systems (except those to the local archive system) must be performed using Kerberized versions of the following tools: scp, ftp, sftp, and mpscp.
The command below uses secure copy (scp) to copy a single local file into a destination directory on a Mustang login node. The mpscp command is similar to the scp command, but has a different underlying means of data transfer, and may enable greater transfer rates. The mpscp command has the same syntax as scp.
% scp local_file user@mustang.afrl.hpc.mil:/target_dir
Both scp and mpscp can be used to send multiple files. This command transfers all files with the .txt extension to the same destination directory.
% scp *.txt user@mustang.afrl.hpc.mil:/target_dir
The example below uses the secure file transfer protocol (sftp) to connect to Mustang, then uses the sftp "cd" and "put" commands to change to the destination directory and copy a local file there. The sftp "quit" command ends the sftp session. Use the sftp "help" command to see a list of all sftp commands.
% sftp user@mustang.afrl.hpc.mil
sftp> cd target_dir
sftp> put local_file
sftp> quit
The Kerberized file transfer protocol (kftp) command differs from sftp in that your username is not specified on the command line, but given later when prompted. The kftp command may not be available in all environments.
% kftp mustang.afrl.hpc.mil
username> user
kftp> cd target_dir
kftp> put local_file
kftp> quit
Windows users may use a graphical file transfer protocol (ftp) client such as FileZilla.
4. User Environment
4.1. User Directories
The following user directories are provided for all users on Mustang:
4.1.1. Home Directory
When you log on to Mustang, you will be placed in your home directory, /p/home/username. The environment variable $HOME is automatically set for you and refers to this directory. $HOME is visible to both the login and compute nodes, and may be used to store small user files. It has an initial quota of 100 GB. $HOME is not intended as permanent storage, but files stored in $HOME are not subject to being purged.
4.1.2. Work Directory
Mustang has one large file system, /p/work1, for the temporary storage of data files needed for executing programs. You may access your personal working directory by using the $WORKDIR environment variable, which is set for you upon login. Your $WORKDIR directory has an initial quota of 10 TB. Your $WORKDIR and the /p/work1 file system will fill up as jobs run. Please review the File Space Management Policy and be mindful of your disk usage.
REMEMBER: /p/work1 is a "scratch" file system and is not backed up. You are responsible for managing files in your $WORKDIR by backing up files to the MSAS and deleting unneeded files when your jobs end. Please review the Archive User Guide for details.
All of your jobs should execute from your $WORKDIR directory, not $HOME. While not technically forbidden, jobs that are run from $HOME are subject to smaller disk space quotas and have a much greater chance of failing if problems occur with that resource. Jobs that are run entirely from your $WORKDIR directory are more likely to complete, even if all other resources are temporarily unavailable.
Maintaining the high performance of the Lustre file system is important for the efficient and effective use of Mustang by all users. You should take steps to ensure your file storage and access methods follow the suggested guidelines as described in the Lustre User Guide.
If you use $WORKDIR in your batch scripts, you must be careful to avoid having one job accidentally contaminate the files of another job. One way to avoid this is to use the $JOBDIR (or $WORK_DIR) directory, which is unique to each job on the system. The $JOBDIR directory is not subject to the File Space Management Policy until the job exits the workload management system.
4.1.3. Center Directory
The Center-Wide File System (CWFS) provides file storage that is accessible from Mustang's login nodes, and from the HPC Portal. The CWFS permits file transfers and other file and directory operations from Mustang using simple Linux commands. Each user has their own directory in the CWFS. The name of your CWFS directory may vary between machines and between centers, but the environment variable $CENTER will always refer to this directory.
The example below shows how to copy a file from the CWFS ($CENTER). While logged into Mustang, copy your file from your work directory to the CWFS.
% cp $WORKDIR/filename $CENTER
4.2. Shells
The following shells are available on Mustang: csh, bash, ksh, tcsh, zsh, and sh. To change your default shell, please email a request to require@hpc.mil. Your preferred shell will become your default shell on the Mustang cluster within 1-2 working days.
4.3. Environment Variables
A number of environment variables are provided by default on all HPCMP HPC systems. We encourage you to use these variables in your scripts where possible. Doing so will help to simplify your scripts and reduce portability issues if you ever need to run those scripts on other systems.
4.3.1. Login Environment Variables
The following environment variables are common to both the login and batch environments:
Variable | Description |
---|---|
$ARCHIVE_HOME | Your directory on the archive server. |
$ARCHIVE_HOST | The host name of the archive server. |
$BC_HOST | The generic (not node specific) name of the system. |
$CC | The currently selected C compiler. This variable is automatically updated when a new compiler environment is loaded. |
$CENTER | Your directory on the Center-Wide File System (CWFS). |
$CSE_HOME | This variable contains the path to the base directory of the default installation of the Computational Science Environment (CSE) installed on a particular compute platform. (See BC policy FY13-01 for CSE details.) |
$CSI_HOME | The directory containing the following list of heavily used application packages: ABAQUS, Accelrys, ANSYS, CFD++, Cobalt, EnSight, Fluent, GASP, Gaussian, LS-DYNA, and MATLAB, formerly known as the Consolidated Software Initiative (CSI) list. Other application software may also be installed here by our staff. |
$CXX | The currently selected C++ compiler. This variable is automatically updated when a new compiler environment is loaded. |
$DAAC_HOME | The directory containing DAAC-supported visualization tools: ParaView, VisIt, and EnSight. |
$F77 | The currently selected Fortran 77 compiler. This variable is automatically updated when a new compiler environment is loaded. |
$F90 | The currently selected Fortran 90 compiler. This variable is automatically updated when a new compiler environment is loaded. |
$HOME | Your home directory on the system. |
$JAVA_HOME | The directory containing the default installation of Java. |
$KRB5_HOME | The directory containing the Kerberos utilities. |
$PET_HOME | The directory containing the tools formerly installed and maintained by the PET staff. This variable is deprecated and will be removed from the system in the future. Certain tools will be migrated to $COST_HOME, as appropriate. |
$PROJECTS_HOME | A common directory where group-owned and supported applications and codes may be maintained for use by members of a group. Any project may request a group directory under $PROJECTS_HOME. |
$SAMPLES_HOME | The Sample Code Repository. This is a collection of sample scripts and codes provided and maintained by our staff to help users learn to write their own scripts. There are a number of ready-to-use scripts for a variety of applications. |
$WORKDIR | Your work directory on the local temporary file system (i.e., local high-speed disk). |
4.3.2. Batch-Only Environment Variables
In addition to the variables listed above, the following variables are automatically set only in your batch environment. That is, your batch scripts will be able to see them when they run. These variables are supplied for your convenience and are intended for use inside your batch scripts.
Variable | Description |
---|---|
$BC_CORES_PER_NODE | The number of cores per node for the compute node on which a job is running. |
$BC_MEM_PER_NODE | The approximate maximum user-accessible memory per node (in integer MB) for the compute node on which a job is running. |
$BC_MPI_TASKS_ALLOC | The number of MPI tasks allocated for a job. |
$BC_NODE_ALLOC | The number of nodes allocated for a job. |
$JOBDIR | Job-specific directory in $WORKDIR immune to scrubbing while job is active. |
4.4. Modules
Software modules are a convenient way to set needed environment variables and include necessary directories in your path so that commands for particular applications can be found. Mustang uses "modules" to initialize your environment with COTS application software, system commands and libraries, compiler suites, environment variables, and PBS batch system commands.
A number of modules are loaded automatically as soon as you log in. To see the modules that are currently loaded, use the "module list" command. To see the entire list of available modules, use "module avail". You can modify the configuration of your environment by loading and unloading modules. For complete information on how to do this, see the Modules User Guide.
4.5. Archive Usage
All of our HPC systems share an online Mass Storage Archival system (MSAS) with 100 TB of Tier 1 archival storage (disk cache) and 6 PB of Tier 2 high-speed archival storage utilizing a robotic tape library. Every user is given an account on the MSAS.
Kerberized login and ftp are allowed into the MSAS system. Locally developed utilities may be used to transfer files to and from the MSAS as well as to create and delete directories, rename files, and list directory contents. For convenience, the environment variable $ARCHIVE_HOME can be used to reference your MSAS archive directory when using archive commands.
4.5.1. Archival Command Synopsis
A synopsis of the main archival utilities is listed below. For information on additional capabilities, see the Archive User Guide or read the online man pages that are available on each system. These commands are non-Kerberized and can be used in batch submission scripts if desired.
Copy one or more files from the MSAS
archive get [-C path] [-s] file1 [file2...]
List files and directory contents on the MSAS
archive ls [lsopts] [file/dir ...]
Create directories on the MSAS
archive mkdir [-C path] [-m mode] [-p] [-s] dir1 [dir2 ...]
Copy one or more files to the MSAS
archive put [-C path] [-D] [-s] file1 [file2 ...]
5. Program Development
5.1. Programming Models
Mustang supports two parallel programming models: Message Passing Interface (MPI) and Open Multi-Processing (OpenMP). A Hybrid MPI/OpenMP programming model is also supported. MPI is an example of the message- or data-passing models, while OpenMP uses only shared memory on a node by spawning threads. And, the hybrid model combines both models.
5.1.1. Message Passing Interface (MPI)
Mustang has MPI libraries from HPE SGI and Intel. SGI's Message Passing Toolkit (MPT) and Intel's MPI support the MPI 3.0 standard, as documented by the MPI Forum. The Message Passing Interface (MPI) is part of the software support for parallel programming across a network of computer systems through a technique known as message passing. MPI establishes a practical, portable, efficient, and flexible standard for message passing that makes use of the most attractive features of a number of existing message-passing systems, rather than selecting one of them and adopting it as the standard. See "man intro_mpi" for additional information.
When creating an MPI program on Mustang, ensure the following:
- That either the Message Passing Toolkit (module mpt) or Intel MPI (module
compiler/intelmpi) has been loaded. To check this, run the "module list" command.
If neither module is listed, use one of the following commands:
module load mpt
or
module load compiler/intelmpi - That the source code includes one of the following lines :
INCLUDE "mpif.h" ## for Fortran
or
#include <mpi.h> ## for C/C++
Using the HPE SGI MPI Library
To compile an MPI program, use the following examples:
For C Codes:
icc -o mpi_program mpi_program.c –lmpi ## Intel gcc -o mpi_program mpi_program.c –lmpi ## GNU pgcc –o mpi_program mpi_program.c –lmpi ## PGI
For Fortran Codes:
ifort -o mpi_program mpi_program.f –lmpi ## Intel gfortran -o mpi_program mpi_program.f –lmpi ## GNU pgf77 –o mpi_program mpi_program.f –lmpi ## PGI pgf90 –o mpi_program mpi_program.f90 –lmpi ## PGI
To run an MPI program within a batch script, use the following command:
mpiexec_mpt -np mpi_procs mpi_program [user_arguments]
where mpi_procs is the number of MPI processes being started. For example:
#### Starts 96 MPI processes; 48 on each node, one per core ## request 2 nodes, each with 48 cores and 48 processes per node #PBS -l select=2:ncpus=48:mpiprocs=48 mpiexec_mpt -np 96 ./a.out
The mpiexec_mpt command launches executables across a set of compute nodes allocated to your job and, by default, utilizes all cores and nodes available to your job. When each member of the parallel application has exited, mpiexec_mpt exits.
A common concern for MPI users is the need for more memory for each process. By default, one MPI process is started on each core of a node. This means that on Mustang, the available memory on the node is split 48 ways. To allow an individual process to use more of the node's memory, you need to start fewer processes on that node. To accomplish this, the user must request more nodes from PBS, but only run on a certain number of them. For example, the following select statement requests 8 nodes, with 48 cores per node, but only uses 12 of those cores for MPI processes:
#### Starts 48 MPI processes; only 12 on each node ## request 8 nodes, each with 48 cores and 12 processes per node #PBS -l select=8:ncpus=48:mpiprocs=12 mpiexec_mpt -np 96 ./a.out
For more information about mpiexec_mpt, type "man mpiexec_mpt".
Using the Intel MPI Library
When compiling with the Intel MPI library on Mustang, swap the default HPE SGI MPI module for an Intel MPI module, as follows:
module unload mpt
module load compiler/intelmpi
To compile using the Intel MPI library, use one of the following examples:
mpicc –o mpi_program mpi_program.c ## for C mpicxx –o mpi_program mpi_program.C ## for C++ mpif77 –o mpi_program mpi_program.f ## for Fortran 77 mpif77 –o mpi_program mpi_program.f ## for Fortran 9x mpifc –o mpi_program mpi_program.f ## for Fortran
The wrapper scripts will use the Intel compiler suite, if loaded, and the GCC suite if not.
To run your program within a batch script, change to the Intel MPI library with which you compiled, and then use the Intel launch command, mpiexec – the module facility will be defined within your batch process. For example:
module unload mpt
module load compiler/intelmpi
mpiexec -np mpi_procs ./mpi_program
where mpi_procs is the number of processes being started. For example:
#### Starts 64 MPI processes; 8 on each node
## Request 8 nodes, each with 48 cores
#PBS -l select=8:ncpus=48:mpiprocs=8
mpiexec -np 64 ./mpi_program
For more information about mpiexec, type "man mpiexec".
5.1.2. Open Multi-Processing (OpenMP)
OpenMP is a portable, scalable model that gives programmers a simple and flexible interface for developing parallel applications. It supports shared-memory multiprocessing programming in C, C++, and Fortran, and consists of a set of compiler directives, library routines, and environment variables that influence compilation and run-time behavior.
When creating an OpenMP program on Mustang, ensure the following:
- If using OpenMP functions (for example, omp_get_wtime), that
the source code includes one of the following lines:
INCLUDE 'omp.h' ## for Fortran
or
#include <omp.h> ## for C/C++
Or, if the code is written in Fortran 90 or later, the following line may be used instead:
USE omp_lib - That the compile command includes an option to reference the OpenMP library. The Intel, PGI, and GNU compilers support OpenMP, and each one uses a different option.
To compile an OpenMP program, use the following examples:
For C codes:
icc –openmp -o OpenMP_program OpenMP_program.c ## Intel
pgcc –mp-nonuma –o OpenMP_program OpenMP_program.c ## PGI
gcc –fopenmp -o OpenMP_program OpenMP_program.c ## GNU
For C++ codes:
icpc –openmp -o OpenMP_program OpenMP_program.c ## Intel
pgcc –mp-nonuma -o OpenMP_program OpenMP_program.c ## PGI
g++ –fopenmp -o OpenMP_program OpenMP_program.c ## GNU
For Fortran codes:
ifort –openmp -o OpenMP_program OpenMP_program.f ## Intel
pgf77 –openmp -o OpenMP_program OpenMP_program.f ## PGI
pgf90 –openmp -o OpenMP_program OpenMP_program.f ## PGI
pgf95 –openmp -o OpenMP_program OpenMP_program.f ## PGI
gfortran –fopenmp -o OpenMP_program OpenMP_program.f ## GNU
See section 5.2 for additional information on available compilers.
When running OpenMP applications, the $OMP_NUM_THREADS environment variable determines the number of threads. If there is no explicit assignment within the job script, $OMP_NUM_THREADS is assigned the value of the resource ompthreads in PBS resource select . If neither of assignments exist, $OMP_NUM_THREADS is assigned the value of the resource ncpus in PBS resource select .For example:
export OMP_NUM_THREADS=48
./OpenMP_program [user_arguments]
In the example above, the application starts the OpenMP_program on one node and spawns a total of 48 threads. Since Mustang has 48 cores per compute node, this yields 1 thread per core.
5.1.3. Hybrid Processing (MPI/OpenMP)
An application built with the hybrid model of parallel programming can run on Mustang using both OpenMP and Message Passing Interface (MPI).
When creating a hybrid (MPI/OpenMP) program on Mustang, follow the instructions in the MPI and OpenMP sections above for creating your program. Then use the compilation instructions for OpenMP.
To run a hybrid program within a batch script, set $OMP_NUM_THREADS equal to the number of threads in the team. Then launch your program using mpiexec_mpt as follows:
#### MPI/OpenMP on 4 nodes, 8 MPI processes total with 6 threads each
## request 4 nodes, each with 48 cores and 2 processes per node
#PBS -l select=4:ncpus=48:mpiprocs=2:ompthreads=6
## assign 8 MPI processes with 2 MPI processes per node
export OMP_NUM_THREADS=6
mpiexec_mpt –np 8 ./mpi_program
5.2. Available Compilers
Mustang has three programming environment suites.
- Intel
- Portland Group (PGI)
- GNU
Mustang has two MPI suites.
- SGI MPT
- Intel MPI
All versions of MPI share a common base set of compilers that are available on both the login and compute nodes.
Language | Intel | PGI | GNU | Serial/Parallel |
---|---|---|---|---|
C | icc | pgcc | gcc | Serial/Parallel |
C++ | icc | pgcc | g++ | Serial/Parallel |
Fortran 77 | ifort | pgf77 | gfortran | Serial/Parallel |
Fortran 90/95 | ifort | pgf90, pgf95 | gfortran | Serial/Parallel |
SGI MPT codes are built using the above compiler commands with addition of "-lmpi" option on the link line. The following additional compiler wrapper scripts are used for building Intel MPI codes:
Language | Intel | PGI | GNU | Serial/Parallel |
---|---|---|---|---|
MPI C | mpicc | mpicc | mpicc | Parallel |
MPI C++ | mpicxx | mpicc | mpicc | Parallel |
MPI f77 | mpif77 | mpif77 | mpif77 | Parallel |
MPI f90 | mpif90 | mpif90 | mpif90 | Parallel |
5.2.1. Intel Compiler Environment
The following table lists some of the more common options that you may use:
Option | Purpose |
---|---|
-c | Generate intermediate object file but do not attempt to link. |
-I directory | Search in directory for include or module files. |
-L directory | Search in directory for libraries. |
-o outfile | Name executable "outfile" rather than the default "a.out". |
-Olevel | Set the optimization level. For more information on optimization, see the section on Profiling and Optimization. |
-free | Process Fortran codes using free form. |
-fpic, or -fPIC | Generate position-independent code for shared libraries. |
-convert big_endian | Big-endian files; the default is for little-endian. |
-g | Generate symbolic debug information. |
-Minfo=all | Reports detailed information about code optimizations to stdout as compile proceeds. |
-openmp | Recognize OpenMP directives. |
-Bdynamic | Compiling using shared objects. |
-fpe-all=0 | Trap floating point, divide by zero, and overflow exceptions. |
Detailed information about these and other compiler options is available in the Intel compiler (ifort, icc, and icpc) man pages on Mustang.
5.2.2. Portland Group (PGI) Compiler Suite
The PGI Programming Environment provides a large number of options that are the same for all compilers in the suite. The following table lists some of the more common options that you may use:
Option | Purpose |
---|---|
-c | Generate intermediate object file but do not attempt to link. |
-I directory | Search in directory for include or module files. |
-L directory | Search in directory for libraries. |
-o outfile | Name executable "outfile" rather than the default "a.out". |
-Olevel | Set the optimization level. For more information on optimization, see the section on Profiling and Optimization. |
-M free | Process Fortran codes using free form. |
-i8, -r8 | Treat integer and real variables as 64-bit. |
-Mbyteswapio | Big-endian files; the default is for little-endian. |
-g | Generate symbolic debug information. |
-Mbounds | Add array bound checking. |
-Minfo=all | Reports detailed information about code optimizations to stdout as compile proceeds. |
-Mlist | Generate a file containing the compiler flags used and a line numbered listing of the source code. |
-mp=nonuma | Recognize OpenMP directives. |
-Ktrap=* | Trap errors such as floating point, overflow, and divide by zero (see man page). |
-fPIC | Generate position-independent code for shared libraries. |
Detailed information about these and other compiler options is available in the PGI compiler (pgf95, pgcc, and pgCC) man pages on Mustang.
5.2.3. GNU Compiler Collection
The GNU Programming Environment provides a large number of options that are the same for all compilers in the suite. The following table lists some of the more common options that you may use:
Option | Purpose |
---|---|
-c | Generate intermediate object file but do not attempt to link. |
-I directory | Search in directory for include or module files. |
-L directory | Search in directory for libraries. |
-o outfile | Name executable "outfile" rather than the default "a.out". |
-Olevel | Set the optimization level. For more information on optimization, see the section on Profiling and Optimization. |
-g | Generate symbolic debug information. |
-Bstatic | Causes executable to link to all libraries statically. |
-fconvert=big-endian | Big-endian files; the default is for little-endian. |
-Wextra -Wall |
Turns on increased error reporting. |
Detailed information about these and other compiler options is available in the GNU compiler (gfortran, gcc, and g++) man pages on Mustang.
5.3. Relevant Modules
By default, Mustang loads the Intel compiler and SGI MPT environments for you. For more information on using modules, see the Modules User Guide.
5.4. Libraries
5.4.1. Intel Math Kernel Library (Intel MKL)
Mustang provides the Intel Math Kernel Library (Intel MKL), a set of numerical routines tuned specifically for Intel platform processors and optimized for math, scientific, and engineering applications. The routines, which are available via both FORTRAN and C interfaces, include:
- LAPACK plus BLAS (Levels 1, 2, and 3)
- ScaLAPACK plus PBLAS (Levels 1, 2, and 3)
- Fast Fourier Transform (FFT) routines for single-precision, double-precision, single-precision complex, and double-precision complex data types
- Discrete Fourier Transforms (DFTs)
- Fast Math and Fast Vector Library
- Vector Statistical Library Functions (VSL)
- Vector Transcendental Math Functions (VML)
The MKL routines are part of the Intel Programming Environment as Intel's MKL is bundled with the Intel Compiler Suite.
Linking to the Intel Math Kernel Libraries can be complex and is beyond the scope of this introductory guide. Documentation explaining the full feature set along with instructions for linking can be found at the Intel Math Kernel Library documentation page.
Intel also makes a link advisor available to assist users with selecting proper linker and compiler options: http://software.intel.com/sites/products/mkl.
5.4.2. Additional Math Libraries
There is also an extensive set of Math libraries available in the $PET_HOME/MATH directory on Mustang. Information about these libraries can be found on the Baseline Configuration website at BC policy FY13-01.
5.5. Debuggers
Mustang provides the GNU Project Debugger (gdb) and DDT debuggers to assist users in debugging their code.
5.5.1. GDB
The GNU Project Debugger (gdb) is a source-level debugger that can be invoked either with a program for execution or a running process id. To launch your program under gdb for debugging, use the following command:
gdb a.out corefile
To attach gdb to a program that is already executing on a node, use the following command:
gdb a.out pid
For more information, the GDB manual can be found at http://www.gnu.org/software/gdb.
5.5.2. DDT
DDT is a debugger that supports threads, MPI, OpenMP, C/C++, Fortran, Co-array Fortran, UPC, and CUDA. Memory debugging and data visualization are supported for large-scale parallel applications. The Parallel Stack Viewer is a unique way to see the program state of all processes and threads at a glance.
DDT is a graphical debugger, therefore you must be able to display it via a UNIX X-Windows interface. There are several ways to do this including SSH X11 Forwarding, HPC Portal, or SRD. Follow the steps below to use DDT via X11 Forwarding or Portal.
- Choose a remote display method: X11 Forwarding, HPC Portal, or SRD. X11 Forwarding
is easier but typically very slow. HPC Portal requires no extra clients and is typically
fast. SRD requires an extra client but is typically fast and may be a good option if doing
a significant amount of X11 Forwarding.
- To use X11 Forwarding:
- Ensure an X server is running on your local system. Linux users will likely have this by default, but MS Windows users need to install a third-party X Windows solution. There are various options available.
- For Linux users, connect to Mustang using ssh -Y. Windows users need to use PuTTY with X11 forwarding enabled (Connection->SSH->X11->Enable X11 forwarding).
- Or to use HPC Portal:
- Navigate to https://centers.hpc.mil/portal.
- Select HPC Portal at AFRL.
- Select XTerm -> AFRL -> Mustang.
- Or, for information on using SRD, see the SRD User Guide.
- To use X11 Forwarding:
- Submit an interactive job, as in the following example:
qsub -l select=1:ncpus=48:mpiprocs=48 -A Project_ID \
-l walltime=00:30:00 -q debug -X -I - Load the Forge DDT module:
module load forge - Start program execution:
ddt -n 4 ./my_mpi_program arg1 arg2 ...
(Example for 4 MPI ranks) - The DDT window will pop up. Verify the application name and number of MPI processes. Click "Run".
An example of using DDT can be found in $SAMPLES_HOME/Programming/DDT_Example on Mustang.
5.6. Code Profiling and Optimization
Profiling is the process of analyzing the execution flow and characteristics of your program to identify sections of code that are likely candidates for optimization, which increases the performance of a program by modifying certain aspects for increased efficiency.
We provide two profiling tools: gprof and codecov to assist you in the profiling process. In addition, a basic overview of optimization methods with information about how they may improve the performance of your code can be found in Performance Optimization Methods (below).
5.6.1. gprof
The GNU Project Profiler (gprof) is a profiler that shows how your program is spending its time and which function calls are made. To profile code using gprof, use the "-pg" option during compilation.
5.6.2. Codecov
The Intel Code Coverage Tool (codecov) can be used in numerous ways to improve code efficiency and increase application performance. The tool leverages Profile-Guided optimization technology (discussed below). Coverage can be specified in the tool as file-level, function-level or block-level. Another benefit to this tool is the ability to compare the profiles of two application runs to find where the optimizations are making a difference. More detailed information on this tool can be found at: https://www.intel.com/software/products/compilers.
5.6.3. Additional Profiling Tools
There is also a set of profiling tools available in the $PET_HOME/pkgs directory on Mustang. Information about these tools may be found on the Baseline Configuration Web site at BC policy FY13-01.
5.6.4. Program Development Reminders
If an application is not programmed for distributed memory, then only the cores on a single node can be used. This is limited to 48 cores on Mustang.
Keep the system architecture in mind during code development. For instance, if your program requires more memory than is available on a single node, then you will need to parallelize your code so that it can function across multiple nodes.
5.6.5. Compiler Optimization Options
The "-Olevel" option enables code optimization when compiling. The level that you choose (0-4) will determine how aggressive the optimization will be. Increasing levels of optimization may increase performance significantly, but you should note that a loss of precision may also occur. There are also additional options that may enable further optimizations. The following table contains the most commonly used options.
Option | Description | Compiler Suite |
---|---|---|
-O0 | No Optimization. (default in GNU) | All |
-O1 | Scheduling within extended basic blocks is performed. Some register allocation is performed. No global optimization. | All |
-O2 | Level 1 plus traditional scalar optimizations such as induction recognition and loop invariant motion are performed by the global optimizer. Generally safe and beneficial. (default in PGI, GNU, & Intel) | All |
-O3 | Levels 1 and 2 plus more aggressive code hoisting and scalar replacement optimizations that may or may not be profitable. Generally beneficial. | All |
-O4 | Levels 1, 2, and 3 plus hoisting of guarded invariant floating point expressions is enabled. | PGI |
-fast -fastsse |
Chooses generally optimal flags for the target platform. Includes: -O2 -Munroll=c:1 -Mnoframe -Mlre -Mautoinline -Mvect=sse -Mscalarsse -Mcache_align -Mflushz. | PGI |
-Mipa=fast,inline | Performs Interprocedural Analysis (IPA) with generally optimal IPA flags for the target platform, and inlining. IPA can be very time-consuming. Flag must be used in both compilation and linking steps. | PGI |
Minline=levels:n | Number of levels of inlining (default: n = 1) | PGI |
-fipa-* | The GNU compilers automatically enable IPA at various -O levels. To set these manually, see the options beginning with -fipa in the gcc man page. | GNU |
-finline-functions | Enables function inlining within a single file | Intel |
-ipon | Enables interprocedural optimization between files and produces up to n object files | Intel |
-inline-level=n | Number of levels of inlining (default: n=2) | Intel |
-Mlist | Creates a listing file with optimization info | PGI |
-Minfo | Info about optimizations performed | PGI |
-Mneginfo | Info on why certain optimizations are not performed | PGI |
-opt-reportn | Generate optimization report with n levels of detail | Intel |
-xHost | Compiler generates code with the highest instruction set available on the processor. | Intel |
5.6.6. Performance Optimization Methods
Optimization generally increases compilation time and executable size, and may make debugging difficult. However, it usually produces code that runs significantly faster. The optimizations that you can use will vary depending on your code and the system on which you are running.
Note: Before considering optimization, you should always ensure that your code runs correctly and produces valid output.
In general, there are four main categories of optimization:
- Global Optimization
- Loop Optimization
- Interprocedural Analysis and Optimization(IPA)
- Function Inlining
Global Optimization
A technique that looks at the program as a whole and may perform any of the following actions:
- Perform on code over all its basic blocks
- Perform control-flow and data-flow analysis for an entire program
- Detect all loops, including those formed by IF and GOTOs statements and perform general optimization
- Constant propagation
- Copy propagation
- Dead store elimination
- Global register allocation
- Invariant code motion
- Induction variable elimination
Loop Optimization
A technique that focuses on loops (for, while, etc.,) in your code and looks for ways to reduce loop iterations or parallelize the loop operations. The following types of actions may be performed:
- Vectorization - rewrites loops to improve memory access performance. Some compilers may also support automatic loop vectorization by converting loops to utilize low-level hardware instructions and registers if they meet certain criteria.
- Loop unrolling - (also known as "unwinding") replicates the body of loops to reduce loop branching overhead and provide better opportunities for local optimization.
- Parallelization - divides loop operations over multiple processors where possible.
Interprocedural Analysis and Optimization (IPA)
A technique that allows the use of information across function call boundaries to perform optimizations that would otherwise be unavailable.
Function Inlining
A technique that seeks to reduce function call and return overhead. It:
- Is used with functions that are called numerous times from relatively few locations.
- Allows a function call to be replaced by a copy of the body of that function.
- May create opportunities for other types of optimization
- May not be beneficial. Improper use may increase code size and actually result in less efficient code.
6. Batch Scheduling
6.1. Scheduler
The Portable Batch System (PBS) is currently running on Mustang. It schedules jobs and manages resources and job queues, and can be accessed through the interactive batch environment or by submitting a batch request. PBS is able to manage both single-processor and multiprocessor jobs. The PBS module is automatically loaded for you when you log in.
6.2. Queue Information
The following table describes the PBS queues available on Mustang:
Priority | Queue Name | Max Wall Clock Time | Max Cores Per Job | Description |
---|---|---|---|---|
Highest | urgent | 168 Hours | 28,224 | Jobs belonging to DoD HPCMP Urgent Projects |
![]() |
debug | 1 Hour | 1,152 | User testing |
high | 168 Hours | 28,224 | Jobs belonging to DoD HPCMP High Priority Projects | |
frontier | 168 Hours | 28,224 | Jobs belonging to DoD HPCMP Frontier Projects | |
standard | 168 Hours | 28,224 | Standard jobs | |
HIE | 24 Hours | 96 | Rapid response for interactive work | |
transfer | 48 Hours | 1 | Data transfer for user jobs | |
Lowest | background | 120 Hours | 48 | Unrestricted access - no allocation charge |
6.3. Interactive Logins
When you log in to Mustang, you will be running in an interactive shell on a login node. The login nodes provide login access for Mustang and support such activities as compiling, editing, and general interactive use by all users. Please note the Login Node Abuse Policy. The preferred method to run resource intensive executions is to use an interactive batch session.
6.4. Interactive Batch Sessions
To use the interactive batch environment, you must first acquire an interactive batch shell. This is done by executing a qsub command with the "-I" option from within the interactive environment. For example,
qsub -l select=N1:ncpus=NUM:mpiprocs=N2 -A Project_ID -q queue_name -l walltime=HHH:MM:SS -I
You must specify the desired maximum walltime. The number of nodes requested (N1) defaults to 1. The number of cores per node (NUM) defaults to 48. The number of processes per node (N2) defaults to NUM. The project ID (Project_ID) defaults to the environment variable $ACCOUNT. The job queue defaults to standard. Valid values for NUM and N2 are between 1 and 48.
Your interactive batch sessions will be scheduled just as normal batch jobs are scheduled depending on the other queued batch jobs, so it may take quite a while. Once your interactive batch shell starts, you can run or debug interactive applications, post-process data, etc.
At this point, you can launch parallel applications on your assigned set of compute nodes by using the mpiexec_mpt command. You can also run interactive commands or scripts on this node.
6.5. Batch Request Submission
PBS batch jobs are submitted via the qsub command. The format of this command is:
qsub [ options ] batch_script_file
qsub options may be specified on the command line or embedded in the batch script file by lines beginning with "#PBS".
For a more thorough discussion of PBS batch submission on Mustang, see the Mustang PBS Guide.
6.6. Batch Resource Directives
Batch resource directives allow you to specify to PBS how your batch jobs should be run and the resources your job requires. Although PBS has many directives, you only need to know a few to run most jobs.
The basic syntax of PBS directives is as follows:
#PBS option[[=]value]
where some options may require values to be included. For example, to start a 24-process job, you would request one node of 48 cores and specify that you will be running 24 processes per node:
#PBS -l select=1:ncpus=48:mpiprocs=24
The following directives are required for all jobs:
Directive | Value | Description |
---|---|---|
-A | Project_ID | Name of the project |
-q | queue_name | Name of the queue |
-l | ncpus=# | Number of cores |
-l | walltime=HHH:MM:SS | Maximum wall time |
Directive | Value | Description |
---|---|---|
-N | Job Name | Name of the job. |
-e | File name | Redirect standard error to the name file. |
-o | File name | Redirect standard output to the name file. |
-j | oe | Merge standard error and standard output into standard output. |
-l application | application_name | Identify the application being used. |
-I | Request an interactive batch shell. | |
-V | Export all environment variables to the job. | |
-v | Variable list | Export specific environment variables to the job. |
A more complete listing of batch resource directives is available in the Mustang PBS Guide.
6.7. Launch Commands
There are different commands for launching MPI executables from within a batch job depending on which MPI implementation your script uses.
To launch an SGI MPT executable, mpiexec_mpt command as follows:
mpiexec_mpt -n #_of_MPI_tasks ./mpijob.exe
To launch an IntelMPI executable, use the mpiexec command as follows:
mpiexec ./mpijob.exe
For OpenMP executables, no launch command is needed.
6.8. Sample Scripts
While it is possible to include all PBS directives at the qsub command line, the preferred method is to embed the PBS directives within the batch request script using "#PBS". The following script is a basic example and contains all of the required directives, some frequently used optional directives, and common script components. It starts 96 processes on 2 nodes of 48 cores each, with one MPI process per core. More thorough examples are available in the Mustang PBS Guide and in the Sample Code Repository ($SAMPLES_HOME) on Mustang.
The following example is a good starting template for a batch script to run a serial job for one hour:
#!/bin/bash ## Specify your shell # # Specify name of the job #PBS -N serialjob # # Append std output to file serialjob.out #PBS -o serialjob.out # # Append std error to file serialjob.err #PBS -e serialjob.err # # Specify Project ID to be charged (Required) #PBS -A Project_ID # # Request wall clock time of 1 hour (Required) #PBS -l walltime=01:00:00 # # Specify queue name (Required) #PBS -q standard # # Specify the number cores (Required) #PBS -l select=1:ncpus=1 # #PBS -S /bin/bash # Change to the specified directory cd $WORKDIR # # Execute the serial executable on 1 core ./serial_fort.exe # End of batch job
The first few lines tell PBS to save the standard output and error output to the given files, and to give the job a name. Skipping ahead, we estimate the run-time to be about one hour and know that this is acceptable for the standard batch queue. We need one core in total, so we request one core.
The following example is a good starting template for a batch script to run a parallel (MPI) job for two hours:
#!/bin/bash ## The first line (above) specifies the shell to use for parsing ## the remaining lines of the batch script. # ## Required PBS Directives -------------------------------------- #PBS -A Project_ID #PBS -q standard #PBS -l select=2:ncpus=48:mpiprocs=48 #PBS -l walltime=02:00:00 # ## Optional PBS Directives -------------------------------------- #PBS -N Test_Run_1 #PBS -j oe #PBS -V #PBS -S /bin/bash # ## Execution Block ---------------------------------------------- # Environment Setup # cd to your personal directory in the scratch file system cd $WORKDIR # # create a job-specific subdirectory based on JOBID and cd to it JOBID=`echo $PBS_JOBID | cut -d '.' -f 1` if [ ! -d $JOBID ]; then mkdir -p $JOBID fi cd $JOBID # # Launching # copy executable from $HOME and submit it cp $HOME/mympiprog.exe . mpiexec_mpt -n 96 ./mympiprog.exe > mympiprog.out # # Clean up # archive your results # Using the "here document" syntax, create a job script # for archiving your data. cd $WORKDIR rm -f archive_job cat > archive_job << END #!/bin/bash #PBS -l walltime=06:00:00 #PBS -q transfer #PBS -A Project_ID #PBS -l select=1:ncpus=1 #PBS -j oe #PBS -S /bin/bash cd $WORKDIR rsh $ARCHIVE_HOST mkdir $ARCHIVE_HOME/$JOBID rcp -r $JOBID $ARCHIVE_HOST:$ARCHIVE_HOME/ rsh $ARCHIVE_HOST ls -l $ARCHIVE_HOME/$JOBID # Remove scratch directory from the file system. rm -rf $JOBID END # # Submit the archive job script. qsub archive_job # End of batch job
The first few lines tell PBS to save the standard output and error output to the given files, and to give the job a name. Skipping ahead, we estimate the run-time to be about 2 hours and know that this is acceptable for the standard batch queue. The next couple of lines set the total number of cores and the number of cores per node for the job. This job is requesting 96 total cores and 48 cores per node allowing the job to run on 2 nodes. The default value for number of cores per node is 48.
Additional examples are available in the Mustang PBS Guide and in the Sample Code Repository ($SAMPLES_HOME) on Mustang.
6.9. PBS Commands
The following commands provide the basic functionality for using the PBS batch system:
qsub: Used to submit jobs for batch processing.
qsub [qsub_options] my_job_script
qstat: Used to check the status of submitted jobs.
qstat PBS_JOBID ##check one job
qstat -u my_user_name ##check all of user's jobs
qdel: Used to kill queued or running jobs.
qdel PBS_JOBID
A more complete list of PBS commands is available in the Mustang PBS Guide.
6.10. Determining Time Remaining in a Batch Job
In batch jobs, knowing the time remaining before the workload management system will kill the job enables the user to write restart files or even prepare input for the next job submission. However, adding such capability to an existing source code requires knowledge to query the workload management system as well as parsing the resulting output to determine the amount of remaining time.
The DoD HPCMP allocated systems now have the library, WLM_TIME, as an easy way to provide the remaining time in the batch job to C, C++, and Fortran programs. The library can be added to your job using the following:
For C:
#include <wlm_time.h>
void wlm_time_left(long int *seconds_left)
For Fortran:
SUBROUTINE WLM_TIME_LEFT(seconds_left)
INTEGER seconds_left
For C++:
extern "C" {
#include <wlm_time.h>
}
void wlm_time_left(long int *seconds_left)
For simplicity, wall-clock-time remaining is returned as an integer value of seconds.
To simplify usage, a module file defines the process environment, and a pkg-config metadata file defines the necessary compiler linker options:
For C:
module load wlm_time
$(CC) ctest.c `pkg-config --cflags --libs wlm_time`
For Fortran:
module load wlm_time
$(F90) test.f90 `pkg-config --cflags-only-I --libs wlm_time`
For C++:
module load wlm_time
$(CXX) Ctest.C `pkg-config --cflags --libs wlm_time`
WLM_TIME works currently with PBS. The developers expect that WLM_TIME will continue to provide a uniform interface encapsulating the underlying aspects of the workload management system.
6.11. Advance Reservations
A subset of Mustang's nodes has been set aside for use as part of the Advance Reservation Service (ARS). The ARS allows users to reserve a user-designated number of nodes for a specified number of hours starting at a specific date/time. This service enables users to execute interactive or other time-critical jobs within the batch system environment. The ARS is accessible via most modern web browsers at https://reservation.hpc.mil. Authenticated access is required. The ARS User Guide is available on HPC Centers.
7. Software Resources
7.1. Application Software
A complete listing with installed versions can be found on our software page. The general rule for all COTS software packages is that the two latest versions will be maintained on our systems. For convenience, modules are also available for most COTS software packages.
7.2. Useful Utilities
The following utilities are available on Mustang. For command-line syntax and examples of usage, please see each utility's online man page.
Name | Description |
---|---|
archive | Perform basic file-handling operations on the archive system |
bcmodule | An enhanced version of the standard module command |
check_license | Check the status of licenses for HPCMP shared applications |
cqstat | Display information about running and pending batch jobs |
mpscp | High-performance remote file copy |
node_use | Display the amount of free and used memory for login nodes |
qflag | Report a problem with a batch job to the HPCMP Help Desk |
qhist | Print tracing information for a batch job |
qpeek | Display spooled stdout and stderr for an executing batch job. |
qview | Display information about batch jobs and queues |
scampi | Transfer data between systems using multiple streams and sockets |
show_queues | Report current batch queue status, usage, and limits |
show_storage | Display disk/file usage and quota information |
show_usage | Display CPU allocation and usage by subproject |
tube | Copy files to a remote system using Kerberos host authentication |
7.3. Sample Code Repository
The Sample Code Repository is a directory that contains examples for COTS batch scripts, building and using serial and parallel programs, data management, and accessing and using serial and parallel math libraries. The $SAMPLES_HOME environment variable contains the path to this area, and is automatically defined in your login environment. Below is a listing of the examples provided in the Sample Code Repository on Mustang
Application_Name Use of the application name resource. | |
Sub-Directory | Description |
application_names | README and list of valid strings for application names intended for use in every PBS script preamble. The HPCMP encourages applications not specifically named in the list to be denoted as "other". |
Applications Application-specific examples; interactive job submit scripts; use of the
application name resource; software license use. | |
abaqus | Instructions for using the abaqus automatic batch script generator as well a sample input deck and sample script for running abaqus jobs. The abaqus module must be loaded. |
accelrys | Instructions for using the accelrys automatic batch script generator as well a sample input deck and sample script for running accelrys jobs. The accelrys module must be loaded. |
ale3d | Instructions, sample PBS job script, and sample input data file for executing the ALE3D application on Mustang. The ale3d module needs to be loaded prior to use. |
ansys | Instructions for using the ansys automatic batch script generator as well a sample input deck and sample script for running ansys jobs. The ansys module must be loaded. |
cart3d | A series of examples to learn how to use the CART3D application. Follow the instructions in README.txt. Sample PBS jobs scripts are provided as well as sample input data files. Membership in the Unix group wpcart3d is required. |
cfd++ | Instructions for using the cfd++ automatic batch script generator as well sample input data and sample script for running cfd++ jobs. The cfd++ module must be loaded. |
CFX | Instructions for using the CFX automatic batch script generator as well a sample input deck and sample script for running CFX jobs. The cfx module must be loaded. |
cobalt | Instructions for using the cobalt automatic batch script generator. Sample job script for the COBALT application. Tar files for two test cases. The cobalt module must be loaded. |
cth | Instructions and PBS submission script to execute CTH jobs. The cth module should be loaded prior to use. |
espresso | Instructions and PBS submission script to execute espresso jobs. The espresso module should be loaded prior to use. |
fieldview | Instructions and PBS submission script to execute fieldview jobs. The fieldview module should be loaded prior to use. |
fluent | Instructions for using the fluent automatic batch script generator. Sample job script for the fluent application. The fluent module must be loaded. |
fun3d | README file with instructions on how to execute the FUN3D application. A PBS job script and example input files in a tar file are provided. Membership in the Unix group fun3d is required to use the tar file. |
gamess | Instructions for using the gamess automatic batch script generator, input data file, and sample PBS job script for the gamess application. The gamess module should be loaded. |
gasp | Brief instructions, two sample PBS job scripts, and two sample input data files to execute the GASP application. The GASP module must be loaded to run the application. Membership in the Unix group wpgasp is required to be able to use the input data tar files. |
gaspex | PBS job script and sample data archive for using the gaspex application. The gaspex module should be loaded. |
gaussian | Instructions for executing the gaussian PBS job script generation tool to run gaussian on Mustang. Also gaussian PBS job script and input data file. |
lsdyna | Instructions for using the ls-dyna automatic batch script generator. Sample job script for the LS_DYNA application. The lsdyna module should be loaded. |
matlab | Instructions to execute an interactive MATLAB job and a .m script to execute in it. A matlab module should be loaded. |
ncar | Instructions on how to use the NCAR Graphics tool. The appropriate ncar module must be loaded beforehand. |
openfoam | Sample PBS job scripts to execute the OPENFOAM application. Instructions on using OPENFOAM are in the scripts. In some cases, the openfoam module must be loaded. |
sierra | Instructions on how to run the SIERRA application. Membership in the sierra group is required to use sierra. |
sqlite3 | Instructions on how to use the SQLITE3 database tool. |
starccm+ | Instructions to use "submit_starccm" to create a PBS job script for starccm+, plus input data files PBS job scripts that have already been generated. One of the starccm modules should be loaded prior to use. |
subversion | README presenting instructions and other information on how to use the SUBVERSION tool to organize software versions. |
vasp | Sample input file and PBS job script for the VASP application. |
Data_Management Archiving and retrieving files; Lustre striping; file searching; $WORKDIR use. | |
MPSCP_to_ARCHIVE | Instructions and sample scripts on using the mpscp utility for transferring files to/from file archive. |
Lustre_FS_Stripes | Instructions and examples for striping large files on the Lustre file systems. |
Postprocess_Example | Example showing how to submit a post-processing script at the end of a parallel computation job to do such things as tar data and store it off of temporary storage to archive storage. |
Transfer_Queue_Example | PBS batch script examples for data transfer using the transfer queue. |
Transfer_Queue_with_Archive_Commands | Example and instructions on recommended best practice to stage data from mass storage using the transfer queue prior to job execution, then processing using that data, then passing output data back to mass storage using the transfer queue again. |
Parallel_Environment MPI, OpenMP, and hybrid examples; single-core jobs; large memory jobs;
running multiple applications within a single batch job. | |
Hello_World_Example | Examples of hello world codes using MPI, OpenMP and a Hybrid code using MPI with OpenMP threads. |
Hybrid_Example | Sample Fortran and C codes and makefile for compiling hybrid MPI/OpenMP codes, and sample scripts for running hybrid MPI/OpenMP jobs. |
Large_Memory_Jobs | Example PBS script and README instructing correct queue for jobs to execute on the big-memory nodes on Mustang. |
mpic | Instructions for running multiple serial processes on multiple cores. |
Mix_Serial_with_Parallel | Demonstration of how to use $PBS_NODEFILE to set several host lists to run serial tasks and parallel tasks on different nodes in the same PBS job. Scripts are provided to demonstrate techniques using both SGI's MPT and Intel's IMPI. |
MPI_Example | Sample code and makefile for compiling MPI code to run on Mustang, and sample script for running MPI jobs. |
Multiple_exec_one_communicator | Example using 3 binaries that shows how to compile and execute a heterogeneous application on Mustang. Scripts are provided for using mpirun in SGI's MPT and mpirun in Intel's IMPI. |
Multiple_Parallel | Demonstration of how to set up and run multiple MPI tasks on different nodes in the same PBS job. Sampels are provided demonstrating use of mpirun in SGI's MPT and mpirun in Intel's IMPI. |
OpenMP_Example | Sample C code and makefile for compiling OpenMP code to run on Mustang, and sample scripts for running OpenMP jobs. |
Serial_Processing_1 | C and Fortran serial program examples and sample scripts for running multiple instances of a serial program across nodes using ssh. |
Serial_Processing_2 | Fortran serial program example and sample scripts for running multiple serial compute tasks simultaneously across cores of a compute node. |
Programming Basic code compilation; debugging; use of library files; static vs. dynamic
linking; Makefiles; Endian conversion. | |
BLACS_Example | Sample BLACS Fortran program, compile script and PBS submission script. The BLACS are from Netlib's ScaLAPACK library in $PET_HOME. |
Core_Files | Instructions and source code for viewing core files with gdb. This sample uses the Gnu compilers. |
ddt | Instructions and sample programs for using the DDT debugger. |
Endian_Conversion | Text file discussing what to do to be able to use binary data generated on a non-X86_64 platform. |
Intel_IMPI_Example | Demonstration how to manipulate the modules and compile and execute code using Intel's IMPI. |
Memory_Usage | Presents a routine callable from Fortran or C used to determine how much memory a process is using. |
MKL_BLACS_Example | Sample BLACS Fortran program, compile script and PBS submission script. The BLACS are from Intel's Math Kernel Library (MKL). |
MKL-ScaLAPACK_Example | Sample ScaLAPACK Fortran program, compile script and PBS job script. The ScaLAPACK solver, BLACS communication, and supporting LAPACK and BLAS routines are all from Intel's MKL. |
MPI_Compilation | Discussion of how to compile MPI codes on Mustang using Intel and Gnu compilers and linking in Intel's IMPI or SGI's MPT. Includes notes on support for each MPI by the available compilers. |
Open_Files_Limit | Discussion and demonstration of the maximum number of simultaneously open files a single process may have. |
ScaLAPACK_Example | Sample ScaLAPACK Fortran program, compile script and PBS submission scripts. The linear solver routines are from Netlib's ScaLAPACK library in $PET_HOME. The LAPACK routines are from Netlib's LAPACK library in $PET_HOME, and the BLAS are from Netlib's LAPACK library in $COST_HOME. |
SO_Compile | Sample Shared Object compilation information, including demonstration of how to compile and assemble a dynamically loaded, ie. shared, library. |
Timers_Fortran | Serial Timers using Fortran Intrinsics f77 and f90/95. |
User_Environment Use of modules; customizing the login environment; use of common environment
variables to facilitate portability of work between systems. | |
modules | Sample README, module description file, and module template for creation of software modules on Mustang. |
Module_Swap_Example | Batch script demonstrating use of several module commands to choose specific modules within a PBS job. |
Workload_Management Basic batch scripting; use of the transfer queue; job arrays; job
dependencies; Secure Remote Desktop; job monitoring. | |
Batchscript_Example | Simple PBS batch script showing all required preamble statements and a few optional statements. More advanced batch script showing more optional statements and a few ways to set up PBS jobs. Description of the system hardware. Process placement is described in a subdirectory. |
Core_Info_Example | Description and C language routine suitable for Fortran and C showing how to determine the node and core placement information for MPI, OpenMP, and hybrid MPI/OpenMP PBS jobs. |
Hybrid_Example | Sample Fortran and C codes and makefile for compiling hybrid MPI/OpenMP codes, and sample scripts for running hybrid MPI/OpenMP jobs. |
Interactive_Example | C and Fortran code samples and scripts for running interactive jobs on Mustang. The sample code is an MPI "Hello World". |
Job_Array_Example | Sample code to generate binary and data and job script for using job arrays. |
Job_Dependencies_Example | Example code, scripts, and instructions demonstrating how to set up a job dependency for jobs depending on how one or more other jobs execute, or to perform some action that one or more other jobs require before execution. |
MPI_Example | Sample code and makefile for compiling MPI code to run on Mustang, and sample script for running MPI jobs. |
OpenMP_Example | Sample C code and makefile for compiling OpenMP code to run on Mustang, and sample scripts for running OpenMP jobs. |
PE_Pinning_Example | Examples demonstrating how to place and/or pin job processing elements, either MPI processes or OpenMP threads, to cores or groups of cores to facilitate more efficient processing and prevent separation of an execution thread from its data and instructions. All combinations of compiler (Intel or Gnu) and MPI (SGI MPT or Intel IMPI) are discussed. |
Transfer_Queue_Example | PBS batch script examples for data transfer using the transfer queue. |
8. Links to Vendor Documentation
8.1. HPE SGI Links
HPE Home: https://www.hpe.com
SGI HPE Documentation Home: https://support.hpe.com
HPE Message Passing Toolkit User Guide
8.2. Red Hat Links
Red Hat Home: http://www.redhat.com
8.3. GNU Links
GNU Home: http://www.gnu.org
GNU Compiler: http://gcc.gnu.org/onlinedocs
8.4. Portland Group (PGI) Links
PGI Home: http://www.pgroup.com
Portland Group Resources Page: http://www.pgroup.com/resources
Portland Group User's Guide: http://www.pgroup.com/doc/pgiug.pdf
8.5. Intel Links
Intel Home: http://www.intel.com
Intel Documentation: http://software.intel.com/en-us/intel-software-technical-documentation
Intel Compiler List: http://software.intel.com/en-us/intel-compilers
8.6. Debugger Links
DDT Tutorials: http://www.allinea.com/tutorials