The HPC Facility (SURYA) has 16 CPU (640 cores) compute node clusters with 4 GPU (160 cores with 2x4 Nvidia Tesla-V100 having 40,960 CUDA core) node clusters along with 8.5TB of RAM. The HPC Facility use Parallel File System (~200TB) of DDN GRIDScaler storage at 15 GBps throughput over 100 Gbps interconnect network.
When a job is submitted, it is placed in a queue. There are different queues available for different purposes. The user must select any one of the queues from the ones listed below which is appropriate for his/her computation need.
Queue | Details |
|
Name of Queue = CORE160 No of nodes = 4 No of x86 Processors = 160 Name of node = Any CPU node {1-16} Walltime = 360 hrs MaxJob = 1 per user |
|
Name of Queue = CORE320 No of nodes = 8 No of x86 Processors = 320 Name of node = Any CPU node {1-16} Walltime = 24 hrs MaxJob = 1 per user |
|
Name of Queue = GPU No of nodes = 1 No of x86 Processors = 40 CUDA cores = 10,240 Name of node = Any GPU node {17-20} Walltime = 360 hrs MaxJob = 1 per user |
Based on the queuing system given above, the node configurations can be summarized as follows:
Queue Type | Queue Name | Node Configuration |
CPU | CORE160 | CPU : 160, RAM : 1,536 GB |
CPU | CORE320 | CPU : 320, RAM : 3,072 GB | >
GPU | GPU | CPU : 40, RAM : 384 GB, 2x Tesla-V100 : 16GB |
#!/bin/bash
#PBS -u FACULTY_NAME
#PBS -N STUDENT_NAME
#PBS -q core160
#PBS -l nodes=4:ppn=40
#PBS -o out.log
#PBS -j oe
#PBS -V
module load compilers/intel/parallel_studio_xe_2018_update3_cluster_edition
cd $PBS_O_WORKDIR
mpiexec.hydra -f $PBS_NODEFILE -np 160 “script_name.sh”
./Job_script.sh
exit;
#!/bin/bash
#PBS -u FACULTY_NAME
#PBS -N STUDENT_NAME
#PBS -q core320
#PBS -l nodes=8:ppn=40
#PBS -o out.log
#PBS -j oe
#PBS -V
module load compilers/intel/parallel_studio_xe_2018_update3_cluster_edition
cd $PBS_O_WORKDIR
mpiexec.hydra -f $PBS_NODEFILE -np 320 “script_name.sh”
./Job_script.sh
exit;
#!/bin/bash
#PBS -u FACULTY_NAME
#PBS -N STUDENT_NAME
#PBS -q gpu
#PBS –l select=1:ncpus=20:ngpus=1 (For ONE GPU)
#PBS –l select=2:ncpus=20:ngpus=1 (For TWO GPU)
#PBS –l select=2:ncpus=20:ngpus=1:host=node{17-20} (For Specific Node)
#PBS -o out.log
#PBS -j oe
#PBS -V
module load compilers/intel/parallel_studio_xe_2018_update3_cluster_edition
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export CUDA_VISIBLE_DEVICES=0,1
cd $PBS_O_WORKDIR
python your_script_name.py
mpirun -np 2 your script_name.sh
./Job_script.sh
exit;
Useful Commands
ssh <username>@172.20.70.12
qsub submit_script.sh
qstat {-a, -s, -n}
ssh node{1-20}
qdel <job-id>
Usage Guidelines