計算機システムの使い方/Smith_en のバックアップソース(No.10) - 第一原理分子動力学プログラム STATE Senri Wiki

* Smith [#ofcf0768]

"Smith" is a computer cluster based on the Intel and Intel-compatible CPUs.

#contents

** Login nodes [#b65445bf]
To use the "Smith" system, log in to the following nodes:
-[smith]   133.1.116.161
-[rafiki]  133.1.116.162
-[tiamat]  133.1.116.211 

To use the "sb100" system, use the following node:
-[sb100]   133.1.116.165

** How to login the login node [#y3d6531e]

To login "smith" type
 $ ssh -l [userID] 133.1.116.161
or
 $ ssh [userID]@133.1.116.161
In a case you allow the X11 forwarding, use
 $ ssh -Y -l [userID] 133.1.116.161
or
 $ ssh -Y [userID]@133.1.116.161

Currently, you get the following message upon login
 -bash: /usr/local/g09/D01/g09/bsd/g09.profile: Permission denied
but it does not affect your work mostly.

NOTE: When you log in for the first time, change your initial password by typing
 $ yppasswd

** How to submit your jobs [#pa13b1de]
To execute your program, use the queueing system, usually using a job script (see below).
For instance, to execute a script "job.sh" using the node (24 cores) in the group 10, type
 $ qsub -q xh1.q -pe x24 24 job.sh
Note group and number of cores can be specified in the job script.
To see the job status, type
 $ qstat
To see the job status of the specific user, type
 $ qstat -u [user ID]
To cancel a job, use
 $ qdel [job ID]
where the job ID can be obtained by using qstat (the number appearing in the first column).
*** Examples of job script [#u8b8a717]
In the following, examples for each groups (queues) are listed. In this case, you just type
 $ qsub job.sh
and do not have to specify the queue group and number of processors explicitly.
- Groups 4
 #$ -S /bin/bash
 #$ -cwd
 #$ -q xe1.q
 #$ -pe x8 8
 #$ -N JOB_NAME
 source /opt/setting/2016.4/intel-compiler.sh
 source /opt/setting/2016.4/intel-mpi.sh
 # Above settings should be consistent with those used in the compilation
 mpirun ./a.out < input.dat > output.dat

- Groups 5
 #$ -S /bin/bash
 #$ -cwd
 #$ -q xe2.q
 #$ -pe x12 12
 #$ -N JOB_NAME
 source /opt/setting/2016.4/intel-compiler.sh
 source /opt/setting/2016.4/intel-mpi.sh
 # Above settings should be consistent with those used in the compilation
 mpirun ./a.out < input.dat > output.dat

- Group 7 (sb100)
-- Hybrid parallelization (ex. use 12 cores with 6 threads parallelization)
 #$ -S /bin/bash
 #$ -cwd
 #$ -q all.q
 #$ -pe x6 12
 #$ -N JOB_NAME
 source /opt/setting/2016.4/intel-compiler.sh
 source /opt/setting/2016.4/intel-mpi.sh
 # Above settings should be consistent with those used in the compilation
 OMP_NUM_THREADS 6
 mpirun -perhost 1 -np $NHOSTS ./a.out < input.dat > output.dat
-- Flat parallelization (12 cores)
 #$ -S /bin/bash
 #$ -cwd
 #$ -q xe2.q
 #$ -pe x12 x12
 #$ -N JOB_NAME
 source /opt/setting/2016.4/intel-compiler.sh
 source /opt/setting/2016.4/intel-mpi.sh
 # Above settings should be consistent with those used in the compilation
 mpirun -np $NSLOTS ./a.out < input.dat > output.dat

- Group 8
 #$ -S /bin/bash
 #$ -cwd
 #$ -q xs2.q
 #$ -pe x16 16
 #$ -N JOB_NAME
 source /opt/setting/2016.4/intel-compiler.sh
 source /opt/setting/2016.4/intel-mpi.sh
 # Above settings should be consistent with those used in the compilation
 MPI_COMMAND=mpirun
 I_MPI_PIN=1
 setenv I_MPI_ADJUST_ALLGATHERV 2
 OMP_NUM_THREADS=1
 cat $PE_HOSTFILE | awk '{ print $1":"$2/ENVIRON["OMP_NUM_THREADS"] }' > 
 hostfile.$JOB_ID
 $MPI_COMMAND ./a.out < input.dat > output.dat

- Group 9
 #$ -S /bin/bash
 #$ -cwd
 #$ -q xi1.q
 #$ -pe x16 16
 #$ -N JOB_NAME
 source /opt/setting/2016.4/intel-compiler.sh
 source /opt/setting/2016.4/intel-mpi.sh
 # Above settings should be consistent with those used in the compilation
 MPI_COMMAND=mpirun
 I_MPI_PIN=1
 setenv I_MPI_ADJUST_ALLGATHERV 2
 OMP_NUM_THREADS=1
 cat $PE_HOSTFILE | awk '{ print $1":"$2/ENVIRON["OMP_NUM_THREADS"] }' > 
 hostfile.$JOB_ID
 $MPI_COMMAND ./a.out < input.dat > output.dat

- Groups 10
 #$ -S /bin/bash
 #$ -cwd
 #$ -q xh1.q
 #$ -pe x24 48
 #$ -N JOB_NAME
 #$ -j y
 source /opt/setting/2016.4/intel-compiler.sh
 source /opt/setting/2016.4/intel-mpi.sh
 # Above settings should be consistent with those used in the compilation
 MPI_COMMAND=mpirun
 I_MPI_PIN=1
 I_MPI_FABRICS=shm:ofa
 OMP_NUM_THREADS=1
 cat $PE_HOSTFILE | awk '{ print $1":"$2/ENVIRON["OMP_NUM_THREADS"] }' > 
 hostfile.$JOB_ID
 $MPI_COMMAND ./a.out < input.dat > output.dat

- Groups 11
 #$ -S /bin/bash
 #$ -cwd
 #$ -q xh2.q
 #$ -pe x24 48
 #$ -N JOB_NAME
 #$ -j y
 source /opt/setting/2016.4/intel-compiler.sh
 source /opt/setting/2016.4/intel-mpi.sh
 # Above settings should be consistent with those used in the compilation
 MPI_COMMAND=mpirun
 I_MPI_PIN=1
 I_MPI_FABRICS=shm:ofa
 OMP_NUM_THREADS=1
 cat $PE_HOSTFILE | awk '{ print $1":"$2/ENVIRON["OMP_NUM_THREADS"] }' > 
 hostfile.$JOB_ID
 $MPI_COMMAND ./a.out < input.dat > output.dat

- Group 13
 #$ -S /bin/bash
 #$ -cwd
 #$ -q xb1.q
 #$ -pe x32 32
 #$ -N JOB_NAME
 #$ -j y
 source /opt/setting/2016.4/intel-compiler.sh
 source /opt/setting/2016.4/intel-mpi.sh
 # Above settings should be consistent with those used in the compilation
 MPI_COMMAND=mpirun
 I_MPI_PIN=1
 I_MPI_FABRICS=shm:ofa
 OMP_NUM_THREADS=1
 cat $PE_HOSTFILE | awk '{ print $1":"$2/ENVIRON["OMP_NUM_THREADS"] }' > 
 hostfile.$JOB_ID
 $MPI_COMMAND ./a.out < input.dat > output.dat

- Group 14
 #$ -S /bin/bash
 #$ -cwd
 #$ -q x17.q
 #$ -pe x32 32
 #$ -N JOB_NAME
 #$ -j y
 source /opt/setting/2016.4/intel-compiler.sh
 source /opt/setting/2016.4/intel-mpi.sh
 # Above settings should be consistent with those used in the compilation
 MPI_COMMAND=mpirun
 I_MPI_PIN=1
 I_MPI_FABRICS=shm:dapl
 OMP_NUM_THREADS=1
 cat $PE_HOSTFILE | awk '{ print $1":"$2/ENVIRON["OMP_NUM_THREADS"] }' > 
 hostfile.$JOB_ID
 $MPI_COMMAND ./a.out < input.dat > output.dat

**Computer nodes and queues [#gcad41b2]

| Group | Proc. | #CORE/#CPU | Submission node | queue |　paral. environ. | Inter-node |
|4 | xeon               　         | 8/2 | smith/rafiki/tiamat | xe1.q | x8 |  |
|5 | xeon               　         | 12/2 | smith/rafiki/tiamat | xe2.q | x12 |  |
|7 | core i7 sandy-bridge | 6/1 | sb100 | all.q | x6 |  |
|8 | xeon sandy-bridge    |16/2 | smith/rafiki/tiamat | xs2.q | x16 |  |
|９ | xeon ivy-bridge        |16/2 | smith/rafiki/tiamat | xi1.q | x16 |  |
|10 | xeon Haswell            | 24/2 | smith/rafiki/tiamat | xh1.q | x24 |  infini-band |
|11 | xeon Haswell             | 24/2 | smith/rafiki/tiamat | xh2.q | x24 | infini-band |
|13 | xeon Broadwell          | 32/2 | smith/rafiki/tiamat | xb1.q | x32 |  infini-band |
|14 | xeon Skylake              | 32/2 | smith/rafiki/tiamat | x17.q | x32 |  infini-band |
NOTE:
- To submit a job to group 8 nodes, login to sb100 and execute qsub
- To submit a job to other group nodes, login to smith and execute qsub
*** Group 4, 5 "xe" system [#e02a591e]
The "xe" system is composed of the nodes with the Xeon CPU, which have 2 CPUs (8 or 12 cores) per node. The parallel environment is x8 and x12.
*** Group 7 "sb100" system [#m1b620b9]
The "sb100" system is based on the Core i7 CPUs with the Sandy-bridge architecture. Each node has 1 CPU (6cores) with 16 GB memory. Fast calculations are possible thanks to the AVX function. The parallel environment is x6.
*** Group 8 "xs" system [#j5c755e2]
The "xs" system is based on the Xeon CPUs with the Sandy-bridge architecture. Each node has 1 CPU (6cores) with 32 GB memory. Fast calculations are possible thanks to the AVX function. The parallel environment is x16.
*** Group 9 "xi" system [#ldcc860e]
The "xi" system is based on the Xeon CPUs with the Ivy-bridge architecture. Each node has 2 CPUs (16cores) with 128 GB memory. Fast calculations are possible thanks to the AVX function. It is recommend to use this system for Gaussian calculations. The parallel environment is x16.
*** Group 10, 11, 12 "xh" system [#oc5448db]
The "xh" system is composed of the nodes with 2 Xeon CPUs (24 cores in total) and 64 GB memory. The parallel environment is x24.
*** Group 13 "xb" system [#h9aaaca9]
The "xb" system is composed of the nodes with 2 Xeon Broadwell CPUs (32 cores in total) and 64 GB memory. The parallel environment is x32.
*** Group 14 "x17" system [#h004b4e7]
The "x17" system is composed of the nodes with 2 Xeon Skylake CPUs (32 cores in total) and 64 GB memory. The parallel environment is x32.
** Network structure [#pa8ce3cd]
- ~ "|" indicates a network connection, "[]" name, for the computer node
 + Engineering intranet, ODINS network
 |
 |           Backbone network( no access outside of engineering network)
 |               |
 +- [smith] -----+                          133.1.116.161 Login & application server & backup server & file server 
 +- [rafiki] ----+                          133.1.116.162 Login & application server & backup server 
 +- [tiamat] ----+                          133.1.116.211 Login & Application server
 |               |
 |               +-- [xe00], [xe01]         Calc. node, group 4 (each node has 8 cores (2CPUs)) paral. env.=x8 queue=xe1.q
 |               +-- [xe02]-[xe06]          Calc. node, group 5 (each node has 8 cores (2CPUs)) paral. env.=x8 queue=xe1.q
 |               |
 |               +-- [xs01]-[xs18]          Calc. node, group 8 (each node has 16 cores (2CPUs)) paral. env.=x16 queue=xs2.q
 |               |
 |               +-- [xi01]-[xi12]          Calc. node, group 9 (each node has 16 cores (2CPUs)) paral. env.=x16 queue=xi1.q
 |               |
 |               +-- [xh01]-[xh17]
 |               +-- [xh19]-[xh34]          Calc. node, group 10 (each node has 16 cores (2CPUs)) paral. env.=x16 queue=xe1.q                        
 |               +-- [xh18],[xh35]-[xh43]   Calc. node, group 11 (each node has 24 cores (2CPUs)) paral. env.=x24 queue=xh2.q
 |               +-- [xb01]-[xb14]          Calc. node, group 13 (each node has 32 cores (2CPUs)) paral. env.=x32 queue=xb1.q
 |               +-- [x1701]-[x1706]        Calc. node, group 14 (each node has 32 cores (2CPUs)) paral. env.=x32 queue=x17.q
 |               |
 |               |
 +- [sb100] -----+                          133.1.116.165 Login node for other groups
                 |
                 +-- [sb101]-[sb120]        Calc. node, group 7 (each node has 6 cores (1 CPU))