=== Whenever Singularity is used (as it is here), bind pertinent directories ===

{{{
export SINGULARITY_BINDPATH="/gs3,/gs4,/gs5,/gs6,/gs7,/gs8,/gs9,/gs10/,/gs11,/gpfs,/spin1,/data,/scratch,/fdb,/lscratch"
}}}

This is something you might want to put in your ~/.bashrc or ~/.bash_profile so it's automatically loaded upon login.

=== Run a CANDLE benchmark ===

This is the most straightforward way to make sure everything is working; you don't have to run it to completion.

==== (1) Set variables ====

{{{
working_dir=<WORKING-DIRECTORY; e.g., ~/test>
gpu_type=<GPU-TYPE-ON-BIOWULF; e.g., k80>
}}}

==== (2) Clone CANDLE benchmarks from Github ====

{{{
mkdir ~/candle
cd ~/candle
git clone https://github.com/ECP-CANDLE/Benchmarks.git
}}}

==== (3) Run benchmark ====

{{{
cd $working_dir
echo '#!/bin/bash' > ./jobrequest.sh
echo "module load singularity" >> ./jobrequest.sh
echo "singularity exec --nv /data/classes/candle/candle-gpu.img python /data/`whoami`/candle/Benchmarks/Pilot1/P1B1/p1b1_baseline_keras2.py" >> ./jobrequest.sh
sbatch --partition=gpu --mem=50G --gres=gpu:$gpu_type:1 ./jobrequest.sh
}}}

You should see your job queued or running in SLURM (e.g., {{{squeue -u $(whoami)}}}) and output being produced in $working_dir.

You can also SSH into the node on which the job is running (which is listed under "NODELIST (REASON)" of the {{{squeue}}} command) and even make sure the node's GPU is being used by running the {{{nvidia-smi}}} command.

Now that you know everything is working you can kill the job using {{{scancel <JOB-ID>}}}, where {{{<JOB-ID>}}} is listed under JOBID of the {{{squeue}}} command.  Or if you're interested, you can let the job run; it should take about 30 min.

=== Run a grid search (a type of hyperparameter optimization) using output from a test model ===

In our case the test model just returns random numbers, but this allows you to test the complete workflow you'll ultimately need for running your own model.

==== (1) Set variables ====

{{{
working_dir=<WORKING-DIRECTORY; e.g., ~/grid_search>
expt_name=<EXPERIMENT-NAME; e.g., random_loss_func>
ntasks=<NTASKS; e.g., 3> # should be greater than 2
job_time=<MAXIMUM-RUNTIME; e.g., 60>
memory=<MAXIMUM-MEMORY-NEEDED; e.g., 10G>
gpu_type=<GPU-TYPE; e.g., k80>
}}}

==== (2) Copy grid search template to working directory ====

{{{
cp -rp /data/classes/candle/grid-search-template/* $working_dir
}}}

==== (3) Edit one file ====

In $working_dir/swift/swift-job.sh change {{{./turbine-workflow.sh}}} to {{{swift/turbine-workflow.sh}}}.

==== (4) "Compile" and run the grid search ====

{{{
cd $working_dir
echo '#!/bin/bash' > ./compile_job.sh
echo "module load singularity” >> ./compile_job.sh
echo "singularity exec /data/classes/candle/candle-gpu.img swift/stc-workflow.sh $expt_name" >> ./compile_job.sh
sbatch -W --time=1 ./compile_job.sh
experiment_id=${expt_name:-experiment}                                                                                                                                                                                                                                                           
sbatch --output=experiments/$experiment_id/output.txt --error=experiments/$experiment_id/error.txt --partition=gpu --gres=gpu:$gpu_type:1 --cpus-per-task=2 --ntasks=$ntasks --mem=$memory --job-name=$experiment_id --time=$job_time --ntasks-per-node=1 swift/swift-job.sh $experiment_id
}}}

=== Run a grid search using your own model ===

We already transferred the CANDLE scripts to a local directory (in the above examples, to working_dir=~/grid_search).  With this directory structure in place, we will now adapt some of these scripts to your own data and model.