Patching and Pausing Jobs
On HPC1 use squeue to determine what jobs are running on what nodes:
squeue --long
Tue May 27 13:16:59 2025
JOBID PARTITION NAME USER STATE TIME TIME_LIMI NODES NODELIST(REASON)
8957 compute a1748355 azeezoe PENDING 0:00 1-00:00:00 1 (Dependency)
8958 compute a1748355 azeezoe PENDING 0:00 20:00:00 1 (Dependency)
8955 compute a1748355 azeezoe RUNNING 1:25:26 7-00:00:00 1 hpc2
8956 compute a1748355 azeezoe RUNNING 1:26:41 7-00:00:00 1 hpc2
Take note of what's running where, in this example let's say hpc2 is about to undergo schedule maintenance. Simply pause the jobs running on it:
scontrol suspend 8955,8956
Once it's back online, resume the jobs:
scontrol resume 8955,8956
No comments to display
No comments to display