Python Virtual Environments
Introduction:
Python is a popular programming tool for use on many types of systems, including HPC systems. Python has many packages that can be added. So many that it doesn't make sense to try to have all of them installed in a centrally managed Python environment. So how can you install the packages that you need? Virtual Environments. When you create a Virtual Environment (VE), you create an environment that inherits the base, system-wide Python environment but in your own user space. Once someone's VE is created and enabled, that person can install packages into that VE. You can create multiple VEs for different tasks. This makes it easier to manage packages that may have conflicting version requirements.
Creating and using Anaconda3 Virtual Environments
For the most part, we use Anaconda3 as the version of Python for use on the ACG HPC systems. To load Anaconda3 into your shell environment run the following in your a terminal session on Katahdin:
module load anaconda3
By default, this does not initialize Virtual Environments in Anaconda. This is important because the VE system and SLURM do not always work well together. In order to initialize your "base" Virtual Environment, run:
$INIT_CONDA
Next, create a Conda Virtual Environment (you can name it anything; the last parameter):
conda create --name cadillac
Then, to activate that VE to be able to use it and install packages into it:
conda activate cadillac
From there, you can install packages into your new VE using the "conda" command or the "pip" command:
conda install numpy
pip install numpy
Putting it all together, creating a VE called "cadillac" and installing the zlib package into it:
[cousins@katahdin ~]$ module load anaconda3
[cousins@katahdin ~]$ $INIT_CONDA
(base) [cousins@katahdin ~]$ conda create --name cadillac
Collecting package metadata (current_repodata.json): done
Solving environment: done
==> WARNING: A newer version of conda exists. <==
current version: 4.8.3
latest version: 4.12.0
Please update conda by running
$ conda update -n base -c defaults conda
## Package Plan ##
environment location: /home/cousins/.conda/envs/cadillac
Proceed ([y]/n)? y
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate cadillac
#
# To deactivate an active environment, use
#
# $ conda deactivate
(base) [cousins@katahdin ~]$ conda activate cadillac
(cadillac) [cousins@katahdin ~]$ conda install zlib
Collecting package metadata (current_repodata.json): done
Solving environment: done
==> WARNING: A newer version of conda exists. <==
current version: 4.8.3
latest version: 4.12.0
Please update conda by running
$ conda update -n base -c defaults conda
## Package Plan ##
environment location: /home/cousins/.conda/envs/cadillac
added / updated specs:
- zlib
The following NEW packages will be INSTALLED:
_libgcc_mutex conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge
_openmp_mutex conda-forge/linux-64::_openmp_mutex-4.5-1_gnu
libgcc-ng conda-forge/linux-64::libgcc-ng-11.2.0-h1d223b6_15
libgomp conda-forge/linux-64::libgomp-11.2.0-h1d223b6_15
libzlib conda-forge/linux-64::libzlib-1.2.11-h166bdaf_1014
zlib conda-forge/linux-64::zlib-1.2.11-h166bdaf_1014
Proceed ([y]/n)? y
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
If you want to use a VE in a cluster job, you just need to put these commands in your SLURM script:
module load anaconda3
$INIT_CONDA
conda activate cadillac
From there, you can run the python command to run your Python script.