This page documents answers to questions that we've been asked or that we're surprised we haven't been asked yet. These questions could be about working in Unity's environment, a difference between Unity and OSC (or other HPC environments), or unexpected behavior of Unity (without judgement of why this is unexpected or who finds it unexpected).
Access and file transfer
File transfer
Unity runs an sftp service; the simplest way to move files to or from Unity is by using an sftp
or scp
command on your local computer.
Windows OpenSSH
Recent versions of Windows 10 include an implementation of the OpenSSH client. This allows Windows users to work with commands such as ssh
and scp
at the command line (PowerShell and Windows Terminal can make this a productive environment). However, the message authentication code (MAC) that the Windows OpenSSH client uses by default is not compatible with recent security enhancements on Unity login nodes. To ssh into a Unity login node from a Windows client from on campus or through the ASC VPN, you need to specify a different MAC. Something like this works:
> ssh -m hmac-sha2-512 name.#@unity.asc.ohio-state.edu
General issues
Don't request all of a node's memory
While the list of compute nodes shows the total amount of memory for each node, not all of the memory is available for jobs; some is required for the node's operating system. This means that if you request 192 GB, for example, that job cannot run on the nodes with 192 GB (or less) of memory. Instead, the job will try to run on a node with more than 192 GB (with our current nodes on Unity, at least 256 GB). You may either have to wait for one of the larger nodes to become available or not be able to run because the nodes with the most memory are exclusive. Instead, ask for a little less than the full amount of memory available on a node (say, 184 GB to run on a 192-GB node).
How do I change my default shell?
You interact with Unity through a program called a shell; over the years, many shells have been developed. Probably the most commonly-used shell is bash, which is the default for Unity. In a stand-alone Linux system, you can change your default shell with the chsh
command. However, Unity gets its user information from ASCTech's directory service; to change your default shell on Unity, submit a request or send email to asctech@osu.edu.
R and Python are installed locally on some (but not all) of the compute nodes. This can lead to some confusion--you don't know if R or Python is there or what version it might be.
To use Python, you should first load an appropriate Python module. Even if you're going to use Python from a conda environment, you'll need to first load a Python module in order to access the conda
command.
When using R (whether in batch mode or interactive mode), first load an appropriate compiler module and then load R.
module load intel
module load R
You can also specify a particular version of R if you don't want the default (after loading the compiler module, run module avail
to see what versions of R are available.
module load gnu/9.1.0
module load R/4.0.2
Install R, Python and Perl packages
Like most HPC centers, we ask users to install their own packages and modules for some environments.
OSC has instructions for installing R, Python and Perl packages.
You can also compile and install your own software.
Common issues when installing packages in R
/tmp
mounted with noexec
ERROR: 'configure' exists but is not executable
when compiling
"Error: Failed to install <package> from GitHub: Could not find tools necessary to compile a package"
in R
It's common for a Linux environment to provide some sort of temporary space for building or installing things such as R or Python packages. Traditionally, that temporary location is the /tmp
directory, although it can be changed with the $TMPDIR
environment variable. Unfortunately, many known exploits use /tmp
to compile and run malicious code, so preventing binaries from executing there is a common security practice. To guard against that, many recent versions of Linux, including that on our login nodes, mount the /tmp
directory with a "noexec" option, which unfortunately means that legitimate package installation cannot occur from here. This can result in a mysterious failure to install a package (the error messages for this are often not very obvious). A fix is to create your own tmp
directory and set $TMPDIR
to point to that.
$ mkdir ~/tmp
$ export TMPDIR=~/tmp
The mkdir
command needs to be done only once (unless you later delete your tmp
directory), and you probably want to run the export
command (or the analogue of that for the shell that you're using) manually just before installing a package that needs this to be set (you probably do not want to put it in your .bashrc
file, for example, because compute nodes have their own idea of $TMPDIR
, which is on the compute nodes' hardware for performance).
Makevars
Using install.packages()
in R typically involves downloading source code and compiling and installing your package in your home directory. Usually this is C, C++ or Fortran code. If you were compiling these things outside of R, you could include options on the command line or in a make file. It's less convenient to include options with install.packages()
, but for some packages you have to.
Sometimes you see fleeting lines of output during compilation like this:
lmrob.c(2709): error: identifier "i" is undefined
for(int i=0; i < n; i++) {
In an older standard of the C language, you could declare a variable in a for()
loop (the "int i=0
" part). Now you can't. But you can tell the compiler to use the older standard. If you were typing the command to compile yourself, you'd just type a flag. To get R to do this, you use a file called Makevars
in the .R
directory in your home directory (so ~/.R/Makevars
). You'll probably have to create .R
and Makevars
. So at a linux command prompt, do this:
$ mkdir ~/.R
$ cd .R
$ vi Makevars
There's a dot before R, R has to be upper-case, and use whatever text editor you want to edit Makevars
. Makevars
just needs one line:
CFLAGS= -std=c99
That gives a flag to the C compiler to use the c99 standard. Then start R and run install.packages()
. After you're done installing packages that need c99, you have to remember to get rid of the Makevars
file or it'll get used the next time you install a package, and you may not need it next time. You can just rename Makevars
as Makevars.c99
, so that R won't find it and you won't need to remember all this the next time you do need it (it can be convenient to have a bunch of Makevars.XXX
files for different cases--you can give C and Fortran compilers different flags or change which compilers get used, for example). A similar flag is gnu99.
There's more information on Makevars
in the R Installation and Administration manual. A Google search on the initial error message during compilation can help you identify the pathology and the fix.
C++ extensions
Some of the packages have another problem. A line like this
/lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found
is telling you about missing C++ extensions. The fix for this is to load cxx17 before starting R. So altogether:
$ ml intel
$ ml cxx17
$ ml R/3.5.1
Then start R and run install.packages()
.
Failed dependencies
Occasionally a package installation can fail because a previous installation of a dependent package failed but left enough of the dependent package behind to let subsequent installations think that the dependent package is okay to use. An example might make this clearer. We recently saw this error during installation of the vctrs
package.
Error: package or namespace load failed for ‘vctrs’:
.onLoad failed in loadNamespace() for 'vctrs', details:
call: library.dynam(lib, package, package.lib)
error: shared object ‘backports.so’ not found
In the appropriate local library path (~/R//x86_64-pc-linux-gnu-library/3.5
), there was a directory namedbackports
, but there was also a directory named 00LOCK-backports
. The install.packages()
function creates the corresponding 00LOCK-
directory while it's installing a package and removes it when the package is successfully installed. If the package installation fails, the 00LOCK-
directory doesn't get deleted. It's apparently not much use in telling you what went wrong, but running remove.packages("backports")
and then install.packages("vctrs")
allowed vctrs
to be installed successfully.
Slurm notes and issues
Lifetime of a simple job
Submit a job
You submit a job with the sbatch
command.
$ sbatch sleep.sh
A Slurm job script is similar to the Torque/Moab scripts used previously on Unity. Here's a simple one; it just writes a message to a file, sleeps for 90 seconds, and writes another message.
#!/usr/bin/env bash
#SBATCH --time=00:10:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --job-name=sbatch-sleep-test
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<name.#>@osu.edu
cd $SLURM_SUBMIT_DIR
echo "Going to sleep: `date`" >> sbatch-sleep-test.txt
sleep 90
echo "Awake: `date`" >> sbatch-sleep-test.txt
Is it running?
You can see all the jobs that are running
$ squeue
You can see the status of your jobs (replace <name.#>
with your own name.#).
$ squeue -u <name.#>
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1003 batch sbatch-s shew.1 R 0:07 1 u066
You can get more information on using squeue
with squeue --help
or man squeue
.
Done
By default, a Slurm job writes its standard output and standard error to a file named slurm-<job_id>.out
in the directory from which the job was submitted (in the example above, we redirected standard output to another file, so an empty slurm-<job_id>.out
file was created).
As with PBS jobs, you can ask Slurm to send you email when a job begins and ends. However, email from Slurm is not as informative; all the useful information is in the email's subject and the body is empty. For example, completion of the above job resulted in email from "SLURM User" with the subject line "Slurm Job_id=1003 Name=sbatch-sleep-test Ended, Run time 00:01:31, COMPLETED, ExitCode 0". From the value of ExitCode you can infer whether the job succeeded (usually ExitCode 0) or failed (usually ExitCode not 0), but not much else.
PBS compatibility
Slurm includes a qsub
command, which is actually a Perl script that translates a PBS job script into suitable Slurm directives at run time. This allows you to continue to use existing PBS scripts, although there's no guarantee as to how long Slurm will continue to provide that script or how well it works with more complicated PBS scripts.
$ qsub job_script.pbs
Interactive sessions
Using salloc
and srun
To run an interactive session, you first use the salloc
command to allocate the desired resources on the cluster, then connect a shell to those resources by using the srun
command. Here's the simplest example of this:
[shew.1@unity-login1 ~] $ salloc
salloc: Pending job allocation 2811719
salloc: job 2811719 queued and waiting for resources
salloc: job 2811719 has been allocated resources
salloc: Granted job allocation 2811719
salloc: Waiting for resource configuration
salloc: Nodes u123 are ready for job
[shew.1@unity-login1 ~] $ srun --jobid=2811719 --pty /bin/bash
[shew.1@u123 ~] $
This allocates the default one node, one core and 3 GB or memory for one hour on a compute node that you have permission to run on. If you want more resources or a specific node, salloc
has many options. You can type salloc --help
or man salloc
to see these options. For example, to ask for eight cores and 120 GB of memory on node u123, you can say this:
$ salloc --nodelist=u123 --ntasks=8 --mem=120g --time=04:00:00
Many of the arguments have short versions; the previous command could have been written:
$ salloc -w u123 -n 8 --mem=120g -t 04:00:00
Apparently the memory flag does not have a short version. Note also that the srun
command requires that you input the jobid from the salloc
command.
A further bit of manual work is required here: You type exit
to quit the shell you started with the srun
command, but the allocation remains. To free resources for other users, you need to remember to cancel the allocation after you're done with it:
[shew.1@u123 ~] $ exit
[shew.1@unity-login1 ~] $ scancel 2811719
Using sinteractive
A simpler way to get an interactive session is to use sinteractive
, which is a Perl script that combines salloc
and srun
, at the expense of a smaller set of options. For example, you can get a similar interactive session, including a shell connected to the allocated resources, with just one command:
[shew.1@unity-login1 ~] $ sinteractive -w u123 -M 120000 -t 04:00:00
An oddity of sinteractive
is that the memory request has to be entered as an integer number of megabytes--it doesn't understand 120g or 120gb.
A further simplification is that when you type exit
to quit the shell, the allocated resources are freed.