Running Distributed-Memory Parallel Jobs

The most flexible, and scalable, type of parallel software is one that uses a library such as MPI. This enables multiple CPUs to be utilised but they can be distributed amongst many nodes (servers).

For an explanation of the basics of submitting a job, please visit the Running Software page.

A typical batch file for an MPI-based application is shown below:

#!/bin/bash

# Set the name of the job
# (this gets displayed when you get a list of jobs on the cluster)
#SBATCH --job-name="My MPI Job"

# Specify the maximum wall clock time your job can use
# (Your job will be killed if it exceeds this)
#SBATCH --time=3:00:00

# Specify the amount of memory your job needs (in Mb)
# (Your job will be killed if it exceeds this for a significant length of time)
#SBATCH --mem-per-cpu=1024

# Specify the number of cpu cores your job requires
#SBATCH --ntasks=20

# Set up the environment
module load gcc/10.2.0
module load openmpi/4.1.1

# Run the application
echo My job is started
mpirun ./my_mpi_program
echo My job has finished

This is identical to the basic job submission file except for the lines in bold.

Here, we are telling SLURM that our job will utilise 20 CPUs at once but, unlike OpenMP above, we do not stipulate how many of these must be on each server. They could be all on one server, there could be 8 on one server and 12 on another, or even one task on each server – it doesn’t matter. This gives SLURM the most flexibility when it tries to schedule the job for execution.

As we are using an MPI library, we need to load the same module that we used when building the software.

And to run an MPI-based application, we need to use the command mpirun which is responsible for starting the 20 copies of the my_mpi_program, and ensuring they can communicate with each other.

As above, the option mem-per-cpu, defines the required amount of memory per CPU, not the total amount required for the application. In this example, the amount specified is 1024Mb (1Gb), so the total requirements for the job is 20Gb.