Python on Strelka

The Strelka Computer Cluster has both Python 2 and 3 installed.  Most Python programs only run as a single process, so running on a cluster won't necessarily speed up execution without parallelizing the code.  If you need to run the same code repeatedly or with different inputs, you can launch many single-process jobs on the cluster at the same time.  Alternatively, it is possible to parallelize your code using modules such as multiprocessing, Dask, or mpi4py.  Cornell University Center for Advanced Computing has information on Python for High Performance which may be useful.

While the system versions of Python can be used for many typical needs, it is generally advised not to use it and instead create a custom Python environment using anaconda/miniconda. Mostly this is because the system Python and its packages can be updated, possibly breaking code. Maintaining your own Python environment ensures that you will receive consistent results and have complete control over versions. Package management is simpler as well, because if you are using the system Python and require a package that is not available, you will need to install it locally within your home directory; this is even more complex if you want or need to use a newer or older version of a package than what is available on the system.

Miniconda

The preferred way to use Python within a shared computing environment is with anaconda/miniconda. The difference is that anaconda comes with several scientific packages already installed, whereas miniconda is a minimal installation of Python, allowing you to create environments with only the packages you need. It is therefore recommended to use miniconda and install packages as needed. Miniconda installs in your personal home directory and enables users to create multiple Python environments and switch among them; for example, you can have one environment with Python 3.6 and another with 3.10, and each can have different versions of various packages. While a complete tutorial is beyond the scope of this document, here is some basic information to get started.

To install miniconda, start by logging into Strelka; you should by default end up in your home directory. The following commands will download miniconda and then start the installation process; generally, you can accept the defaults, and after installation you will need to log out and back in for certain environment variable changes to take effect.

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh

Update miniconda:

conda update conda

Miniconda is most useful for creating and managing multiple Python environments. The getting started documentation is a great place to start, but here are some common commands (more advanced commands are listed on the conda cheat sheet).

Create an environment named testenv and add the biopython package (any needed dependencies will be installed automatically); note that while you can add packages to and use the default environment (named base), this is not advised and you should generally create and use a named environment for most projects:

conda create --name testenv biopython

Create an environment with a specific Python version:

To use a particular environment, you must "activate" it, and similarly you can "deactivate" it; note that you can also include these commands in your Slurm scripts and submission files:

List all available environments (the current active environment will be marked with an *):

Install a package within the currently active environment, list all packages installed in the environment, and then uninstall a package:

Update all packages within a particular environment:

Install a specific version of a package:

Delete an environment and all its packages:

You can also export or import whole environments, which is useful for working with a team or lab to ensure that everyone is using an identical environment:

System Python

Python 3

Run Python 3.6:

Use pip to install Python packages in your home directory (in ~/.local/):

Python 2

While Python 2 has been deprecated for several years, some older code relies on it and it remains available for use. If at all possible, however, Python 3 is preferred. Run Python 2.7:

Use pip to install Python packages in your home directory (in ~/.local/):

Jupyter Notebooks

It is possible to run Python in a Jupyter notebook on Strelka.  For more information see Running Jupyter on Strelka

Ways you can contact ITS or find information:

ITS Support Portal: https://support.swarthmore.edu
Email: support@swarthmore.edu
Phone: x4357 (HELP) or 610-328-8513
Check out our remote resources at https://swatkb.atlassian.net/wiki/spaces/remote/overview
Check our homepage at https://swarthmore.edu/its