Unity Frequently Asked Questions

This page documents answers to questions that we've been asked or that we're surprised we haven't been asked yet. These questions could be about working in Unity's environment, a difference between Unity and OSC (or other HPC environments), or unexpected behavior of Unity (without judgement of why this is unexpected or who finds it unexpected).

 

Access and file transfer

File transfer

Unity runs an sftp service; the simplest way to move files to or from Unity is by using an sftp or scp command on your local computer.

Windows OpenSSH

Recent versions of Windows 10 include an implementation of the OpenSSH client. This allows Windows users to work with commands such as ssh and scp at the command line (PowerShell and Windows Terminal can make this a productive environment). However, the message authentication code (MAC) that the Windows OpenSSH client uses by default is not compatible with recent security enhancements on Unity login nodes. To ssh into a Unity login node from a Windows client from on campus or through the ASC VPN, you need to specify a different MAC. Something like this works:

> ssh -m hmac-sha2-512 name.#@unity.asc.ohio-state.edu

General issues

Don't request all of a node's memory

While the list of compute nodes shows the total amount of memory for each node, not all of the memory is available for jobs; some is required for the node's operating system. This means that if you request 192 GB, for example, that job cannot run on the nodes with 192 GB (or less) of memory. Instead, the job will try to run on a node with more than 192 GB (with our current nodes on Unity, at least 256 GB). You may either have to wait for one of the larger nodes to become available or not be able to run because the nodes with the most memory are exclusive. Instead, ask for a little less than the full amount of memory available on a node (say, 184 GB to run on a 192-GB node).

How do I change my default shell?

You interact with Unity through a program called a shell; over the years, many shells have been developed. Probably the most commonly-used shell is bash, which is the default for Unity. In a stand-alone Linux system, you can change your default shell with the chsh command. However, Unity gets its user information from ASCTech's directory service; to change your default shell on Unity, submit a request or send email to asctech@osu.edu.

Using R and Python in Unity

R and Python are installed locally on some (but not all) of the compute nodes. This can lead to some confusion--you don't know if R or Python is there or what version it might be.

To use Python, you should first load an appropriate Python module. Even if you're going to use Python from a conda environment, you'll need to first load a Python module in order to access the conda command.

When using R (whether in batch mode or interactive mode), first load an appropriate compiler module and then load R.

module load intel 
module load R

You can also specify a particular version of R if you don't want the default (after loading the compiler module, run module avail to see what versions of R are available.

module load gnu/9.1.0
module load R/4.0.2

Install R, Python and Perl packages

Like most HPC centers, we ask users to install their own packages and modules for some environments.

OSC has instructions for installing R, Python and Perl packages.

You can also compile and install your own software.

Common issues when installing packages in R

/tmp mounted with noexec

ERROR: 'configure' exists but is not executable when compiling

"Error: Failed to install <package> from GitHub: Could not find tools necessary to compile a package" in R

It's common for a Linux environment to provide some sort of temporary space for building or installing things such as R or Python packages. Traditionally, that temporary location is the /tmp directory, although it can be changed with the $TMPDIR environment variable. Unfortunately, many known exploits use /tmp to compile and run malicious code, so preventing binaries from executing there is a common security practice. To guard against that, many recent versions of Linux, including that on our login nodes, mount the /tmp directory with a "noexec" option, which unfortunately means that legitimate package installation cannot occur from here. This can result in a mysterious failure to install a package (the error messages for this are often not very obvious). A fix is to create your own tmp directory and set $TMPDIR to point to that.

$ mkdir ~/tmp
$ export TMPDIR=~/tmp

The mkdir command needs to be done only once (unless you later delete your tmp directory), and you probably want to run the export command (or the analogue of that for the shell that you're using) manually just before installing a package that needs this to be set (you probably do not want to put it in your .bashrc file, for example, because compute nodes have their own idea of $TMPDIR, which is on the compute nodes' hardware for performance).

Makevars

Using install.packages() in R typically involves downloading source code and compiling and installing your package in your home directory. Usually this is C, C++ or Fortran code. If you were compiling these things outside of R, you could include options on the command line or in a make file. It's less convenient to include options with install.packages(), but for some packages you have to.

Sometimes you see fleeting lines of output during compilation like this:

lmrob.c(2709): error: identifier "i" is undefined
for(int i=0; i < n; i++) { 

In an older standard of the C language, you could declare a variable in a for() loop (the "int i=0" part). Now you can't. But you can tell the compiler to use the older standard. If you were typing the command to compile yourself, you'd just type a flag. To get R to do this, you use a file called Makevars in the .R directory in your home directory (so ~/.R/Makevars). You'll probably have to create .R and Makevars. So at a linux command prompt, do this:

$ mkdir ~/.R
$ cd .R
$ vi Makevars

There's a dot before R, R has to be upper-case, and use whatever text editor you want to edit Makevars. Makevars just needs one line:

CFLAGS= -std=c99

That gives a flag to the C compiler to use the c99 standard. Then start R and run install.packages(). After you're done installing packages that need c99, you have to remember to get rid of the Makevars file or it'll get used the next time you install a package, and you may not need it next time. You can just rename Makevars as Makevars.c99, so that R won't find it and you won't need to remember all this the next time you do need it (it can be convenient to have a bunch of Makevars.XXX files for different cases--you can give C and Fortran compilers different flags or change which compilers get used, for example). A similar flag is gnu99.

There's more information on Makevars in the R Installation and Administration manual. A Google search on the initial error message during compilation can help you identify the pathology and the fix.

C++ extensions

Some of the packages have another problem. A line like this

/lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found

is telling you about missing C++ extensions. The fix for this is to load cxx17 before starting R. So altogether:

$ ml intel
$ ml cxx17
$ ml R/3.5.1

Then start R and run install.packages().

Failed dependencies

Occasionally a package installation can fail because a previous installation of a dependent package failed but left enough of the dependent package behind to let subsequent installations think that the dependent package is okay to use. An example might make this clearer. We recently saw this error during installation of the vctrs package.

Error: package or namespace load failed for ‘vctrs’:
.onLoad failed in loadNamespace() for 'vctrs', details:
call: library.dynam(lib, package, package.lib)
error: shared object ‘backports.so’ not found

In the appropriate local library path (~/R//x86_64-pc-linux-gnu-library/3.5), there was a directory namedbackports, but there was also a directory named 00LOCK-backports. The install.packages() function creates the corresponding 00LOCK- directory while it's installing a package and removes it when the package is successfully installed. If the package installation fails, the 00LOCK- directory doesn't get deleted. It's apparently not much use in telling you what went wrong, but running remove.packages("backports") and then install.packages("vctrs") allowed vctrs to be installed successfully.

Adding conda environments to Jupyter Notebook kernels

One time, add ipykernel to your conda environment.

When building new conda environment

$ ml python/3.7-2020.02
$ conda create --name tf25 tensorflow=2.5 ipykernel
$ conda activate tf25
$ python -m ipykernel install --user --name envs-tf25 --display-name "Python (tf25)"
$ conda deactivate

When using existing conda environment

$ ml python/3.7-2020.02
$ conda activate tf24
$ conda install ipykernel
$ python -m ipykernel install --user --name envs-tf24 --display-name "Python (tf24)"
$ conda deactivate

To use such an environment from Jupyter Notebook in OnDemand, start Jupyter Notebook and select the name of your environment's kernel from the New button in the upper right.

Slurm notes and issues

Lifetime of a simple job

Submit a job

You submit a job with the sbatch command.

$ sbatch sleep.sh

A Slurm job script is similar to the Torque/Moab scripts used previously on Unity. Here's a simple one; it just writes a message to a file, sleeps for 90 seconds, and writes another message.

#!/usr/bin/env bash

#SBATCH --time=00:10:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --job-name=sbatch-sleep-test
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<name.#>@osu.edu

cd $SLURM_SUBMIT_DIR

echo "Going to sleep:  `date`" >> sbatch-sleep-test.txt
sleep 90
echo "Awake:  `date`" >> sbatch-sleep-test.txt

Is it running?

You can see all the jobs that are running 

$ squeue

You can see the status of your jobs (replace <name.#> with your own name.#).

$ squeue -u <name.#>
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
 1003     batch sbatch-s   shew.1  R       0:07      1 u066

You can get more information on using squeue with squeue --help or man squeue.

Done

By default, a Slurm job writes its standard output and standard error to a file named slurm-<job_id>.out in the directory from which the job was submitted (in the example above, we redirected standard output to another file, so an empty slurm-<job_id>.out file was created).

As with PBS jobs, you can ask Slurm to send you email when a job begins and ends. However, email from Slurm is not as informative; all the useful information is in the email's subject and the body is empty. For example, completion of the above job resulted in email from "SLURM User" with the subject line "Slurm Job_id=1003 Name=sbatch-sleep-test Ended, Run time 00:01:31, COMPLETED, ExitCode 0". From the value of ExitCode you can infer whether the job succeeded (usually ExitCode 0) or failed (usually ExitCode not 0), but not much else.

PBS compatibility

Slurm includes a qsub command, which is actually a Perl script that translates a PBS job script into suitable Slurm directives at run time. This allows you to continue to use existing PBS scripts, although there's no guarantee as to how long Slurm will continue to provide that script or how well it works with more complicated PBS scripts.

$ qsub job_script.pbs

Interactive sessions

Using salloc and srun

To run an interactive session, you first use the salloc command to allocate the desired resources on the cluster, then connect a shell to those resources by using the srun command. Here's the simplest example of this:

[shew.1@unity-login1 ~] $ salloc
salloc: Pending job allocation 2811719
salloc: job 2811719 queued and waiting for resources
salloc: job 2811719 has been allocated resources
salloc: Granted job allocation 2811719
salloc: Waiting for resource configuration
salloc: Nodes u123 are ready for job

[shew.1@unity-login1 ~] $ srun --jobid=2811719 --pty /bin/bash
[shew.1@u123 ~] $

This allocates the default one node, one core and 3 GB or memory for one hour on a compute node that you have permission to run on. If you want more resources or a specific node, salloc has many options. You can type salloc --help or man salloc to see these options. For example, to ask for eight cores and 120 GB of memory on node u123, you can say this:

$ salloc --nodelist=u123 --ntasks=8 --mem=120g --time=04:00:00

Many of the arguments have short versions; the previous command could have been written:

$ salloc -w u123 -n 8 --mem=120g -t 04:00:00

Apparently the memory flag does not have a short version. Note also that the srun command requires that you input the jobid from the salloc command.

A further bit of manual work is required here: You type exit to quit the shell you started with the srun command, but the allocation remains. To free resources for other users, you need to remember to cancel the allocation after you're done with it:

[shew.1@u123 ~] $ exit
[shew.1@unity-login1 ~] $ scancel 2811719

Using sinteractive

A simpler way to get an interactive session is to use sinteractive, which is a Perl script that combines salloc and srun, at the expense of a smaller set of options. For example, you can get a similar interactive session, including a shell connected to the allocated resources, with just one command:

[shew.1@unity-login1 ~] $ sinteractive -w u123 -M 120000 -t 04:00:00

An oddity of sinteractive is that the memory request has to be entered as an integer number of megabytes--it doesn't understand 120g or 120gb.

A further simplification is that when you type exit to quit the shell, the allocated resources are freed.

 

 

Details

Article ID: 67646
Created
Mon 11/26/18 12:20 PM
Modified
Tue 11/14/23 5:14 PM