Smith

"Smith" is a computer cluster based on the Intel and Intel-compatible CPUs.

Login nodes

To use the "Smith" system, log in to the following nodes:

  • [smith] 133.1.116.161
  • [rafiki] 133.1.116.162
  • [tiamat] 133.1.116.211

To use the "sb100" system, use the following node:

  • [sb100] 133.1.116.165

How to login the login node

To login "smith" type

ssh -l [userID] 133.1.116.161

or

ssh [userID]@133.1.116.161

In a case you allow the X11 forwarding, use

ssh -Y -l [userID] 133.1.116.161

or

ssh -Y [userID]@133.1.116.161

Currently, you get the following message upon login

-bash: /usr/local/g09/D01/g09/bsd/g09.profile: Permission denied

but it does not affect your work mostly.

NOTE: When you log in for the first time, change your initial password by typing

passwd

How to submit your jobs

To execute your program, use the queueing system, usually using a job script (see below). For instance, to execute a script "job.sh" using the node (24 cores) in the group 10, type

qsub -q xh1.q -pe x24 24 job.sh

Note group and number of cores can be specified in the job script. To see the job status, type

qstat

To see the job status of the specific user, type

qstat -u [user ID]

To cancel a job, use

qdel [job ID]

where the job ID can be obtained by using qstat (the number appearing in the first column).

Examples of job script

  • Groups 10, 11, 12
    #$ -S /bin/bash
    #$ -cwd
    #$ -q xh1.q
    #$ -pe x24 24
    #$ -N CO
    source /opt/setting/2016.4/intel-compiler.sh
    source /opt/setting/2016.4/intel-mpi.sh
    MPI_COMMAND=mpirun
    I_MPI_PIN=1
    I_MPI_FABRICS=shm:dapl
    OMP_NUM_THREADS=1
    cat $PE_HOSTFILE | awk '{ print $1":"$2/ENVIRON["OMP_NUM_THREADS"] }' > hostfile.$JOB_ID
    $MPI_COMMAND ./a.out < input.dat > output.dat

Computer nodes and queues

GroupProc.#Node/#CORESubmission nodequeue paral. environ.Inter-node
4xeon  8/2smith/rafiki/tiamatxe1.qx8
5xeon  12/2smith/rafiki/tiamatxe2.qx12
7core i7 sandy-bridge6/1sb100all.qx6
8xeon sandy-bridge16/2smith/rafiki/tiamatxs2.qx16
xeon ivy-bridge16/2smith/rafiki/tiamatxi1.qx16
10xeon Haswell24/2smith/rafiki/tiamatxh1.qx24infini-band
11xeon Haswell24/2smith/rafiki/tiamatxh2.qx24infini-band
13xeon Broadwell32/2smith/rafiki/tiamatxb1.qx32infini-band
14xeon Skylake32/2smith/rafiki/tiamatx17.qx32infini-band

NOTE:

  • To submit a job to group 8 nodes, login to sb100 and execute qsub
  • To submit a job to other group nodes, login to smith and execute qsub

Network structure

  • "|" indicates a network connection, "[]" name, for the computer node

    + Engineering intranet, ODINS network
    |
    |           Backbone network( no access outside of engineering network)
    |               |
    +- [smith] -----+                          133.1.116.161 Login & application server & backup server & file server 
    +- [rafiki] ----+                          133.1.116.162 Login & application server & backup server 
    +- [tiamat] ----+                          133.1.116.211 Login & Application server
    |               |
    |               +-- [xe00], [xe01]         Calc. node, group 4 (each node has 8 cores (2CPUs)) paral. env.=x8 queue=xe1.q
    |               +-- [xe02]-[xe06]          Calc. node, group 5 (each node has 8 cores (2CPUs)) paral. env.=x8 queue=xe1.q
    |               |
    |               +-- [xs01]-[xs18]          Calc. node, group 8 (each node has 16 cores (2CPUs)) paral. env.=x16 queue=xs2.q
    |               |
    |               +-- [xi01]-[xi12]          Calc. node, group 9 (each node has 16 cores (2CPUs)) paral. env.=x16 queue=xi1.q
    |               |
    |               +-- [xh01]-[xh17]
    |               +-- [xh19]-[xh34]          Calc. node, group 10 (each node has 16 cores (2CPUs)) paral. env.=x16 queue=xe1.q                        
    |               +-- [xh18],[xh35]-[xh43]   Calc. node, group 11 (each node has 24 cores (2CPUs)) paral. env.=x24 queue=xh2.q
    |               +-- [xb01]-[xb14]          Calc. node, group 13 (each node has 32 cores (2CPUs)) paral. env.=x32 queue=xb1.q
    |               +-- [x1701]-[x1706]        Calc. node, group 14 (each node has 32 cores (2CPUs)) paral. env.=x32 queue=x17.q
    |               |
    |               |
    +- [sb100] -----+                          133.1.116.165 Login node for other groups
                    |
                    +-- [sb101]-[sb120]        Calc. node, group 7 (each node has 6 cores (1 CPU))
トップ   新規 一覧 単語検索 最終更新   ヘルプ   最終更新のRSS