=== Whenever Singularity is used (as it is here), bind pertinent directories === {{{ export SINGULARITY_BINDPATH="/gs3,/gs4,/gs5,/gs6,/gs7,/gs8,/gs9,/gs10/,/gs11,/gpfs,/spin1,/data,/scratch,/fdb,/lscratch" }}} This is something you might want to put in your ~/.bashrc or ~/.bash_profile so it's automatically loaded upon login. === Run a CANDLE benchmark === This is the most straightforward way to make sure everything is working; you don't have to run it to completion. ==== (1) Set variables ==== {{{ working_dir= gpu_type= }}} ==== (2) Clone CANDLE benchmarks from Github ==== {{{ mkdir ~/candle cd ~/candle git clone https://github.com/ECP-CANDLE/Benchmarks.git }}} ==== (3) Run benchmark ==== {{{ cd $working_dir echo '#!/bin/bash' > ./jobrequest.sh echo "module load singularity" >> ./jobrequest.sh echo "singularity exec --nv /data/classes/candle/candle-gpu.img python /data/`whoami`/candle/Benchmarks/Pilot1/P1B1/p1b1_baseline_keras2.py" >> ./jobrequest.sh sbatch --partition=gpu --mem=50G --gres=gpu:$gpu_type:1 ./jobrequest.sh }}} You should see your job queued or running in SLURM (e.g., {{{squeue -u $(whoami)}}}) and output being produced in $working_dir. You can also SSH into the node on which the job is running (which is listed under "NODELIST (REASON)" of the {{{squeue}}} command) and even make sure the node's GPU is being used by running the {{{nvidia-smi}}} command. Now that you know everything is working you can kill the job using {{{scancel }}}, where {{{}}} is listed under JOBID of the {{{squeue}}} command. Or if you're interested, you can let the job run; it should take about 30 min. === Run a grid search (a type of hyperparameter optimization) using output from a test model === In our case the test model just returns random numbers, but this allows you to test the complete workflow you'll ultimately need for running your own model. ==== (1) Set variables ==== {{{ working_dir= expt_name= ntasks= # should be greater than 2 job_time= memory= gpu_type= }}} ==== (2) Copy grid search template to working directory ==== {{{ cp -rp /data/classes/candle/grid-search-template/* $working_dir }}} ==== (3) Edit one file ==== In $working_dir/swift/swift-job.sh change {{{./turbine-workflow.sh}}} to {{{swift/turbine-workflow.sh}}}. ==== (4) "Compile" and run the grid search ==== {{{ cd $working_dir echo '#!/bin/bash' > ./compile_job.sh echo "module load singularityā€¯ >> ./compile_job.sh echo "singularity exec /data/classes/candle/candle-gpu.img swift/stc-workflow.sh $expt_name" >> ./compile_job.sh sbatch -W --time=1 ./compile_job.sh experiment_id=${expt_name:-experiment} sbatch --output=experiments/$experiment_id/output.txt --error=experiments/$experiment_id/error.txt --partition=gpu --gres=gpu:$gpu_type:1 --cpus-per-task=2 --ntasks=$ntasks --mem=$memory --job-name=$experiment_id --time=$job_time --ntasks-per-node=1 swift/swift-job.sh $experiment_id }}} === Run a grid search using your own model === We already transferred the CANDLE scripts to a local directory (in the above examples, to working_dir=~/grid_search). With this directory structure in place, we will now adapt some of these scripts to your own data and model.