Running Software

The cluster is a shared resource available to all researchers in the Faculty of Science and Engineering. There may be many users logged in at the same time accessing the filesystem, hundreds of jobs may be running on the compute nodes, with a hundred jobs queued up waiting for resources.

As such there are a few rules/guidelines regarding it’s use.

The most important is “Users must not run long/large jobs on the Login Node”. The login node is the server in the cluster that you log in to, and therefore it is shared amongst all current users. These users may be transferring files, editing files or compiling their software. Running a large or computationally intensive task on this node will negatively impact its performance and responsiveness for the other users. Should a user be discovered doing this, their programs will be terminated without notice.

An extra guideline, only relevent for users of the Astute Cluster (enhpc.swan.ac.uk), is – always store your data files under the lustre folder. This folder uses a large parallel filestore that is designed to cope with large amounts of I/O from undreds of cores simultaneously.

Submitting Jobs

In order to run jobs on the compute nodes you will need to submit a job submission script. This is a small text file that describes the resources the job will require as well as specifying the commands that must be run in order to complete the job.

This is then submitted to the queueing system which will decide when and where to run the job in order to most effectively utilise the available compute nodes.

Creating a Job Submission Script (Single CPU Jobs)

The submission file has a fairly standard structure regardless of what kind of software you want to run on the cluster. An example is shown below:

Section 1

Section 2



Section 3



Section 4



Section 5


Section 6


Section 7

#!/bin/bash

# Set the name of the job
# (this gets displayed when you list the jobs on the cluster)
#SBATCH --job-name="My Wave Simulation"

# Specify the maximum wall clock time your job can use
# (Your job will be killed if it exceeds this)
#SBATCH --time=3:00:00

# Specify the amount of memory your job needs (in Mb)
# (Your job will be killed if it exceeds this)
#SBATCH --mem-per-cpu=1024

# Specify the number of cpu cores your job requires
#SBATCH --ntasks=1

# Set up the environment
module load gcc/10.2.0

# Run the application
echo My job is started
./my_program
echo My job has finished

Section 1 – This specifies the shell used to run your program. On the cluster, this is invariable bash and so should remain as above.

Section 2 – This is our first SBATCH comment – this tells the queuing system, SLURM, the name of the job. The chosen name will be displayed when you list the jobs in the queue using the command squeue and so eases identification of your jobs.

Section 3 – REQUIRED – This line tells SLURM the maximum amount of time your job can run. It is specified in hours, minutes and seconds. Jobs may complete sooner than the specified duration, but will be terminated shortly afterwards.

Section 4 – REQUIRED – This line defines the maximum amount of memory (per cpu core) that your job requires. The default units are Mb. Any job that exceeds that amount of memory will be terminated.

Section 5 – REQUIRED – The number of cpu cores your job requires. For any software that is not designed to run in parallel, this will be 1.

The above sections simple inform Slurm of your resource requirements for the job. The next two sections contain actual commands executed by Linux when your job begins.

Section 6 – Most software require one or more modules to be loaded in order to operate correctly. Users tend to use one of two standard practices.

A user may load their required modules in their .bashrc file. The advantage of this method is that it ensures that the correct modules are loaded as soon as they log in and when they submit any jobs. However, a disadvantage is that if you have different software requiring conflicting modules then it quickly becomes confusing and error-prone.
A user loads the required modules as and when they need them. This has the advantage of flexibility but it does mean the modules need to be loaded manually after log in.

If option 2 is used then the required modules will need to be loaded in the batch file (as shown above).

Section 7 – The final section contains the commands used to run your software.

Job Submission

To submit a job to the cluster, use the sbatch command as shown below:

sbatch my_submission_file

This will display a job identifier which is unique to the submitted job. This can be used later to query or kill your job.

Monitoring your job

The squeue command is used to list the jobs either in the queue or currently running on the cluster. It has a number of options to alter its behavior.

To list all jobs use:

squeue

To list just your jobs use:

squeue --user=username

To get more detailed information about a particular job, the scontrol command can be used:

scontrol show job=jobId

Killing a Job

Any job that has been submitted to the queue can be removed using the scancel command:

scancel jobId

If the job is queued waiting to run, then it is removed from the queue. If it is currently running, then it is terminated and removed from the queue.

Other Job Submission Types

For examples of submitting other types of jobs please follow the links below:

Shared-Memory Parallel Jobs	This is for applications that can make use of multiple CPU cores but they must all run on the same server. These jobs are often referred to as OpenMP, Multi-Core or Multi-Threaded.
Distributed-Memory Parallel Jobs	This is for applications that can make use of multiple CPU cores distributed across many servers. These jobs invariabley use some kind of message passing library such as MPI.
Python Jobs	This for applications written in Python. Running Python programs with and without `virtualenv` are covered.

HPC Guide

Faculty of Science & Engineering