Comparison of Systems¶

Below we compare in depth the Cori, Theta and Titan systems, software environment and job submission process to aid office of science users in utilizing multiple resources.

Hardware In-Depth¶

System->	Cori	Theta	Titan
Facility	NERSC	ALCF	OLCF
Model	Cray XC40	Cray XC40	Cray XK7
Processor	Intel Xeon Phi 7250 ("Knights Landing")	Intel Xeon Phi 7230 ("Knights Landing")	AMD Opteron 6274 ("Interlagos")
Processor Cores	68	64	16 CPU cores (2668 (896) SP (DP) CUDA cores on K20X GPU)
Processor Base Frequency	1.4 GHz	1.3 GHz	2.2 GHz
Processor Max Frequency	1.6 GHz	1.5 GHz	3.1 GHz (disabled)
On-Device Memory	16 GB MCDRAM	16 GB MCDRAM	(6 GB GDDR5 on K20X GPU)
Processor DRAM	96 GB DDR4	192 GB DDR4	32 GB DDR3
Accelerator	(none)	(none)	NVIDIA Tesla K20X ("Kepler")
Nodes	9 688	3 624	18 688
Perf. Per Node	2.6 TF	2.6 TF	1.4 TF
Node local storage	(none)	128 GB SSD	(none)
External Burst Buffer	1.8 PB	(none)	(none)
Parallel File System	30 PB Lustre	10 PB Lustre	28 PB Lustre
Interconnect	Cray Aries	Cray Aries	Cray Gemini
Topology	Dragonfly	Dragonfly	3D torus
Peak Perf	30 PF	10 PF	27 PF

Software Environment¶

System->	Cori	Theta	Titan
Software environment management	modules	modules	modules
Batch Job Scheduler	Slurm	Cobalt	PBS
Compilers
Intel	(default) `module load PrgEnv-intel`	(default) `module load PrgEnv-intel`	`module load PrgEnv-intel`
Cray	`module load PrgEnv-cray`	`module load PrgEnv-cray`	`module load PrgEnv-cray`
GNU	`module load PrgEnv-gnu`	`module load PrgEnv-gnu`	`module load PrgEnv-gnu`
PGI	n/a	n/a	(default) `module load PrgEnv-pgi`
CLANG	n/a	`module load PrgEnv-llvm`	n/a
Interpreters
R	gcc + MKL: `module load R` Cray: `module load cray-R`	`module load cray-R`	`module load r`
Python 2	Anaconda + Intel MKL: `module load python/2.7-anaconda`	Cray: `module load cray-python` Intel: `module load intelpython26`	`module load python_anaconda`
Python 3	Anaconda + Intel MKL: `module load python/3.5-anaconda`	Intel: `module load intelpython35`	`module load python_anaconda3`
Libraries
FFT	FFTW: `module load fftw` Cray FFTW: `module load cray-fftw` Intel MKL: automatic with Intel compilers	FFTW: `module load fftw` Cray FFTW: `module load cray-fftw` Intel MKL: automatic with Intel compilers	FFTW: `module load fftw` Cray FFTW: `module load cray-fftw`
Cray LibSci	(default) `module load cray-libsci`	`module load cray-libsci`	`module load cray-libsci`
Intel MKL	automatic with Intel compilers	automatic with Intel compilers	automatic with Intel compilers
Trilinos	`module load cray-trilinos`	`module load cray-trilinos`	`module load cray-trilinos`
PETSc	`module load cray-petsc`	`module load cray-petsc`	`module load cray-petsc`
SHMEM	`module load cray-shmem`	`module load cray-shmem`	`module load cray-shmem`
memkind	`module load cray-memkind`	`module load cray-memkind`	n/a
I/O Libraries
HDF5	`module load cray-hdf5`	`module load cray-hdf5`	`module load cray-hdf5`
NetCDF	`module load cray-netcdf`	`module load cray-netcdf`	`module load cray-netcdf`
Parallel NetCDF	`module load cray-parallel-netcdf`	`module load cray-parallel-netcdf`	`module load cray-parallel-netcdf`
Performance Tools and APIs
Intel VTune Amplifier	`module load vtune`	`source /opt/intel/vtune_amplifier_xe/amplxe-vars.sh`	n/a
CrayPAT	`module load perftools-base && module load perftools`	`module load perftools`	`module load perftools`
PAPI	`module load papi`	`module load papi`	`module load papi`
Darshan	(default) `module load darshan`	`module load cray-memkind`	`module load darshan`
Other Packages and Frameworks
Shifter	(part of base system)	`module load shifter`	n/a

Compiler Wrappers¶

Use these wrappers to properly cross-compile your source code for the compute nodes of the systems, and bring in appropriate headers for MPI, etc.

System->	Cori	Theta	Titan
C++	`CC`	`CC`	`CC`
C	`cc`	`cc`	`cc`
Fortran	`ftn`	`ftn`	`ftn`

Job Submission¶

Theta ¶

Job Script¶

#!/bin/bash
#COBALT -t 30
#COBALT --attrs mcdram=cache:numa=quad
#COBALT -A <yourALCFProjectName>
echo "Starting Cobalt job script"
export n_nodes=$COBALT_JOBSIZE
export n_mpi_ranks_per_node=32
export n_mpi_ranks=$(($n_nodes * $n_mpi_ranks_per_node))
export n_openmp_threads_per_rank=4
export n_hyperthreads_per_core=2
export n_hyperthreads_skipped_between_ranks=4
aprun -n $n_mpi_ranks -N $n_mpi_ranks_per_node \
  --env OMP_NUM_THREADS=$n_openmp_threads_per_rank -cc depth \
  -d $n_hyperthreads_skipped_between_ranks \
  -j $n_hyperthreads_per_core \
  <executable> <executable args>

The #COBALT -t 30 line indicates 30 minutes runtime. Generally, #COBALT lines are equivalent to specifying qsub command-line arguments.

Job Submit Command¶

qsub -n 512 ./theta_script.sh

The -n 512 argument requests 512 nodes.

Titan ¶

Job Script¶

#!/bin/bash
#PBS -A <yourOLCFProjectName>
#PBS -N test
#PBS -j oe
export n_nodes=$JOBSIZE
export n_mpi_ranks_per_node=8
export n_mpi_ranks=$(($n_nodes * $n_mpi_ranks_per_node))

cd $MEMBERWORK/<yourOLCFProjectName>
date

export OMP_NUM_THREADS=2

aprun -n $n_mpi_ranks -N $n_mpi_ranks_per_node \
  -d 2  <executable> <executable args>

Job Submit Command¶

qsub -l nodes=512 ./theta_script.sh

The -l nodes=512 argument requests 512 nodes (this can also be put in the batch script).

NERSC provides a page in the MyNERSC website which generates job scripts automatically based on specified runtime configurations. An example script is shown below, in which a code uses 512 nodes of Xeon Phi with MCDRAM configured in "flat" mode, with 4 MPI processes per node and 34 OpenMP threads per MPI process, using 2 hyper-threads per physical core of Xeon Phi:

Job Script¶

#!/bin/bash
#SBATCH -N 512
#SBATCH -C knl,quad,flat
#SBATCH -p debug
#SBATCH -J myapp_run1
#SBATCH [email protected]
#SBATCH --mail-type=ALL
#SBATCH -t 00:30:00

#OpenMP settings:
export OMP_NUM_THREADS=34
export OMP_PLACES=threads
export OMP_PROC_BIND=spread


#run the application:
srun -n 2048 -c 68 --cpu_bind=cores numactl -p 1 myapp.x

Comparison of Systems¶

Hardware In-Depth¶

Software Environment¶

Compiler Wrappers¶

Job Submission¶

Theta¶

Job Script¶

Job Submit Command¶

Titan¶

Job Script¶

Job Submit Command¶

Cori¶

Job Script¶

Theta ¶

Titan ¶

Cori ¶