High Performance Computing Servers

Important Warning

What follows might be bullshit. I am just trying to make sense of the things from my experience and my readings, but I am not an expert in HPC.

Use GPU on a server

Note that if a server has GPU support, it does not mean that every nodes has a GPU. For example, it is very unlikely that the login node has a GPU. Thus you cannot test your code using GPU on the login node.

You can check the available resources with the command sinfo.

Access a computing node

When you access a server your filesystem is usually mounted on a login node. The login node is a server that you can use to submit jobs to the computing nodes.

Alternatively, you can access a computing node directly. Assuming that server is using Linux, with SLURM as job scheduler, you can access a computing node with the following command:

srun --pty <OTHER SLURM OPTIONS> /bin/bash

If your software uses GPU, you ask for a GPU node. To do that, usually you need to specify a partition with GPU nodes, and the number of GPUs you need. You can check the available partitions with the command sinfo.

For example I do:

srun --pty --partition=sgpu_short --gpus=2 /bin/bash