The HPC Facility (SURYA) has 16 CPU (640 cores) compute node clusters with 4 GPU (160 cores with 2x4 Nvidia Tesla-V100 having 40,960 CUDA core) node clusters along with 8.5TB of RAM. The HPC Facility use Parallel File System (~200TB) of DDN GRIDScaler storage at 15 GBps throughput over 100 Gbps interconnect network.
When a job is submitted, it is placed in a queue. There are different queues available for different purposes. The user must select any one of the queues from the ones listed below which is appropriate for his/her computation need.
| Queue | Details |
|
Name of Queue = CORE160 No of nodes = 4 No of x86 Processors = 160 Name of node = Any CPU node {1-16} Walltime = 360 hrs MaxJob = 1 per user |
|
Name of Queue = CORE320 No of nodes = 8 No of x86 Processors = 320 Name of node = Any CPU node {1-16} Walltime = 24 hrs MaxJob = 1 per user |
|
Name of Queue = GPU No of nodes = 1 No of x86 Processors = 40 CUDA cores = 10,240 Name of node = Any GPU node {17-20} Walltime = 360 hrs MaxJob = 1 per user |
Based on the queuing system given above, the node configurations can be summarized as follows:
| Queue Type | Queue Name | Node Configuration |
| CPU | CORE160 | CPU : 160, RAM : 1,536 GB |
| CPU | CORE320 | CPU : 320, RAM : 3,072 GB | >
| GPU | GPU | CPU : 40, RAM : 384 GB, 2x Tesla-V100 : 16GB |
#!/bin/bash#PBS -u FACULTY_NAME#PBS -N STUDENT_NAME#PBS -q core160#PBS -l nodes=4:ppn=40#PBS -o out.log#PBS -j oe#PBS -Vmodule load compilers/intel/parallel_studio_xe_2018_update3_cluster_editioncd $PBS_O_WORKDIRmpiexec.hydra -f $PBS_NODEFILE -np 160 “script_name.sh”./Job_script.shexit;
#!/bin/bash#PBS -u FACULTY_NAME#PBS -N STUDENT_NAME#PBS -q core320#PBS -l nodes=8:ppn=40#PBS -o out.log#PBS -j oe#PBS -Vmodule load compilers/intel/parallel_studio_xe_2018_update3_cluster_editioncd $PBS_O_WORKDIRmpiexec.hydra -f $PBS_NODEFILE -np 320 “script_name.sh”./Job_script.shexit;
#!/bin/bash#PBS -u FACULTY_NAME#PBS -N STUDENT_NAME#PBS -q gpu#PBS –l select=1:ncpus=20:ngpus=1 (For ONE GPU)#PBS –l select=2:ncpus=20:ngpus=1 (For TWO GPU)#PBS –l select=2:ncpus=20:ngpus=1:host=node{17-20} (For Specific Node)#PBS -o out.log#PBS -j oe#PBS -Vmodule load compilers/intel/parallel_studio_xe_2018_update3_cluster_editionexport LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATHexport CUDA_VISIBLE_DEVICES=0,1cd $PBS_O_WORKDIRpython your_script_name.pympirun -np 2 your script_name.sh./Job_script.shexit;
Useful Commands
ssh <username>@172.20.70.12qsub submit_script.shqstat {-a, -s, -n}ssh node{1-20}qdel <job-id>
Usage Guidelines