Running Python Jobs

This section details how to run Python jobs on the cluster. There are two methods of running Python jobs on the cluster depending on whether you require a Virtual Environment (virtualenv).

For the basics of running a job on the cluster please read the Running Software page. If your Python program is designed to utilise multiple cores on a server then reading the Running Shared-Memory Parallel Jobs page is also recommended (particularly the explanation of the options ntasks and ntasks-per-node).

Without virtualenv

To use Python on the clusters, the Python module must be loaded, e.g.

module load python/3.9

If your program requires specialist packages then these can be installed within your account using the command:

pip3.9 install package_name

It is possible to run your program on the login node for a short time for testing or debugging purposes using the command:

python3.9 my_python_program.py

However, for longer runs you must submit a job to the queueing system so it gets scheduled on the compute nodes. A typical job submission file is shown below:

#!/bin/bash

# Set the name of the job
# (this gets displayed when you list the jobs on the cluster)
#SBATCH --job-name="My Python Code"

# Specify the maximum wall clock time your job can use
# (Your job will be killed if it exceeds this)
#SBATCH --time=3:00:00

# Specify the amount of memory your job needs (in Mb)
# (Your job will be killed if it exceeds this)
#SBATCH --mem-per-cpu=1024

# Specify the number of cpu cores your job requires
#SBATCH --ntasks=1

# Set up the environment
module load python/3.9.5

# Run the application
python3.9 my_python_program.py

This is identical to a standard job submission file except for the sections in bold.

Firstly, since we are using Python, we need to load the Python module.

The python program can then be executed using the command at the bottom of the file.

With virtualenv

If your Python program requires specialist packages but you either do not want to install the globally in your account, or you have multiple Python programs that require conflicting packages then you will need to use virtualenv.

To set up a virtual environment, you will need to first load the Python module:

module load python/3.9

A virtual environment can then be created using the following command:

python3.9 -m venv my_virtual_env_name

For debugging or testing purposes, it is permissible to run interactively on the login node. This can be achieved by firstly activating your vritual environment, and then installing packages and running your Python program as normal.

source ./my_virtual_env_name/bin/activate
pip3.9 install my_package
python3.9 my_python_code.py

For longer runs you must submit a job to the queueing system so it gets scheduled on the compute nodes. A typical job submission file is shown below:

#!/bin/bash

# Set the name of the job
# (this gets displayed when you list the jobs on the cluster)
#SBATCH --job-name="My Python Code"

# Specify the maximum wall clock time your job can use
# (Your job will be killed if it exceeds this)
#SBATCH --time=3:00:00

# Specify the amount of memory your job needs (in Mb)
# (Your job will be killed if it exceeds this)
#SBATCH --mem-per-cpu=1024

# Specify the number of cpu cores your job requires
#SBATCH --ntasks=1

# Set up the environment
module load python/3.9.5

# Activate the Virtual Environment
source ./my_virtual_env_name/bin/activate

# Run the application
python3.9 my_python_program.py

This is identical to a standard job submission file except for the sections in bold.

Firstly, since we are using Python, we need to load the Python module.

We then need to activate our virtual environment that was created previously.

The python program can then be executed in the normal manner.