Accueil

Using MPI on the HYDRA supercomputer

[id : 249] [25/02/2008] [hits : 72673]


This step by step example guides the HYDRA user through the setup of his MPI parallel environment, allowing him to execute programmes on multiple cpu's.

1. A small parallel c-programme 'hello_world.c' is used as example source-code:


#include <stdio.h>
#include <mpi.h>

int
main(argc, argv)
int argc;
char *argv[];
{
int rank, size, len;
char name[MPI_MAX_PROCESSOR_NAME];

MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);

MPI_Get_processor_name(name, &len);
printf ("Hello world! I'm %d of %d on %s\n", rank, size, name);

MPI_Finalize();
exit(0);
}



2. First we compile and link this programme with the special MPI libs:

/opt/hpmpi/bin/mpicc hello_world.c


3. We now run the executable on 1 node 'hydra3':

time /opt/hpmpi/bin/mpirun -np 4 a.out


Hello world! I'm 2 of 4 on hydra3
Hello world! I'm 0 of 4 on hydra3
Hello world! I'm 1 of 4 on hydra3
Hello world! I'm 3 of 4 on hydra3
1.53s real 0.03s user 0.07s system



4. Next we need to generate ssh-keys in order to allow access to multiple nodes.
The file 'key.sh' containing the following commands is executed:


cd
ssh-keygen -t dsa<<EOF


EOF
cd .ssh
cp id_dsa.pub authorized_keys



5. Next step is creating an LSF job (Load Sharing Facility).
Parallel calculation on multiple nodes can only be achieved by using batch jobs.

The job named 'sub' will be entered in queue MPI16 and will run on 16 processors. This means two nodes with each 4 dual-core Opteron cpu's will be used.


#BSUB -q MPI16
#BSUB -o log
#BSUB -n 16
/opt/hpmpi/bin/mpirun -lsb_hosts -np 16 ./a.out



6. We will now submit the job to the LSF system:

bsub <sub


Job <27933> is submitted to queue <MPI16>.



7. Check the status:

bjobs -u hpeeters


No unfinished job found


bjobs -d


JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
27933 hpeeter DONE MPI16 hydra3 8*hydra28 *6 ./a.out Oct 13 10:16
8*hydra27



8. Check the history:

bhist -d


Summary of time in seconds spent in various states:
JOBID USER JOB_NAME PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL
27933 hpeeter *./a.out 4 0 11 0 0 0 15



9. Inspect the output produced:

cat log


Sender: LSF System <lsfadmin@hydra28>
Subject: Job 27933: <#BSUB -q MPI16;#BSUB -o log;#BSUB -n 16;/opt/hpmpi/bin/mpirun -lsb_hosts -np 16 ./a.out> Done

Job <#BSUB -q MPI16;#BSUB -o log;#BSUB -n 16;/opt/hpmpi/bin/mpirun -lsb_hosts -np 16 ./a.out> was submitted from host <hydra3> by user <hpeeters>.
Job was executed on host(s) <8*hydra28>, in queue <MPI16>, as user <hpeeters>.
<8*hydra27>
</u/hpeeters> was used as the home directory.
</bfucc/cc/hpeeters/mpi> was used as the working directory.
Started at Fri Oct 13 10:16:55 2006
Results reported at Fri Oct 13 10:17:06 2006

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
#BSUB -q MPI16
#BSUB -o log
#BSUB -n 16
/opt/hpmpi/bin/mpirun -lsb_hosts -np 16 ./a.out

------------------------------------------------------------

Successfully completed.

Resource usage summary:

CPU time: 63.72 sec.
Max Memory: 2 MB
Max Swap: 10 MB

Max Processes: 1
Max Threads: 1

The output (if any) follows:

Hello world! I'm 8 of 16 on hydra27
Hello world! I'm 2 of 16 on hydra28
Hello world! I'm 3 of 16 on hydra28
Hello world! I'm 4 of 16 on hydra28
Hello world! I'm 7 of 16 on hydra28
Hello world! I'm 6 of 16 on hydra28
Hello world! I'm 0 of 16 on hydra28
Hello world! I'm 1 of 16 on hydra28
Hello world! I'm 5 of 16 on hydra28
Hello world! I'm 9 of 16 on hydra27
Hello world! I'm 11 of 16 on hydra27
Hello world! I'm 10 of 16 on hydra27
Hello world! I'm 12 of 16 on hydra27
Hello world! I'm 13 of 16 on hydra27
Hello world! I'm 14 of 16 on hydra27
Hello world! I'm 15 of 16 on hydra27


We see that this job took a total of 11 seconds on 16 processors, having a cumulated cpu-time of 63 seconds.

Herman Peeters - Herman.Peeters@vub.ac.be

http://webnotes.ulb.ac.be/&noteid=249

: :: ::: ::::