Cloud computing with GPUs

In this post we will walk through setting up a personal cloud-based computing environment with GPU access using Amazon Web Services (AWS) Elastic Compute Cloud (EC2) platform.

If you are reading this post, you probably already have a few reasons for offloading some of your computations to the cloud. Still, I’ll list some of my own:

GPU access. It’s no secret that (for some types of computations) using a GPU can lead to massive speedups
On-demand interactive sessions. Even if you have access to a GPU through a university computing cluster, these resources can often have queues, and interactive sessions may be low priority. If you want to spend a few hours profiling or optimizing code, it’s nice to not have to wait.
Software control. Another downside of shared resources is not being able to update or install any software you want. Having your own environment, on the other hand, lends you complete control.

The main downside with using AWS instead of a university cluster is the cost. At the time of writing, my simple GPU setup costs about $0.50 per hour to run. Depending on how much you customize your environment, you may incur additional storage costs.

This guide will be focused on accessing AWS EC2 resources through command line tools, which I find a bit more convenient for regular use than Amazon’s web-based interface. Also, at the end of this post I will show you how to automate this process in a script, so that in the future you can access your workstation with a simple shell command and not have to navigate the online process every time. Nonetheless, the online dashboard will be helpful as we get things set up for the first time. Some nice guides which use the online dashboard are here and here.

Get an account

The first step is to create an account. While you’re at it, it’s a good idea to set up an IAM (identity and access management) account so you don’t have to do everything as the root user.

Once you’ve logged in, you’ll also have to put in a service quota increase request. By default, new users do not have access to GPUs, likely to prevent accidental charges. To do this, you can click on your username in the upper right corner and then click “Service Quotas” to navigate to the service quotas page. On the left-hand menu select “AWS services” and then scroll down to “Amazon Elastic Compute Cloud (EC2)” and click. You’ll want to request at least four vCPUs for “Running On-Demand G and VT instances”. The G instances are the most basic instances with GPU access, and the smallest of these instances has four vCPUs.

Install AWS CLI

Next we install the AWS command line interface tool. The full instructions are here and I reccomend using them. Here I’ll summarize the steps for MacOS.

Run these commands to download and install the AWS command-line interface:

curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg"
sudo installer -pkg AWSCLIV2.pkg -target /

Afterwards, check that things are indeed installed by running

which aws
aws --version

You should see details on the version you have just installed.

Finally we need to configure the AWS CLI. Run

aws configure

and you will be prompted for some credentials. In the online dashboard, click your username in the upper right-hand corner and select “Security credentials”. Scroll down to “Access Keys” and create an access key. AWS may prompt you to do this as an IAM user if you created an IAM profile when you set up your account. Enter the access key and the secret access key into the prompts in the command line, and also specify a default location (e.g. us-east-1) and a default output format (e.g. json or text).

Generate key

This step and the next can be executed on the AWS web-bashed dashboard if you prefer. We want to generate a key pair to ensure that only users with access to the key (you) can connect to the server.

This command generates a key pair and saves the output in a file my-key.pem for later use. Of course, feel free to rename the key or file. You can create multiple keys for different purposes if you so desire.

aws ec2 create-key-pair \
    --key-name my-key \
    --query 'KeyMaterial' \
    --output text > my-key.pem

Now protect the key so that only you can access it with

chmod 600 my-key.pem

and move it to a safe location on your computer.

Create security group

A security group is simply a set of rules determining who can access your instance. We’ll make a very simple security group. First, create a security group with

aws ec2 create-security-group \
    --group-name my-security-group \
    --description "Basic security group for my computer"

changing the values of the group name and description as you see fit. Note the value of the GroupId of the security group you just created. Next we decide who is included in the security group. If you only want your computer to be able to access the instance, run

aws ec2 authorize-security-group-ingress \
    --group-id my-security-group-id \
    --protocol tcp \
    --port 22 \
    --cidr your.ip.address.here/32

replacing my-security-group-id with the GroupId of the group you just created, and replacing your.ip.address.here with your IP address. You can check your IP adress by curl https://checkip.amazonaws.com. If you want to allow ssh traffic from anywhere (helpful if you use a VPN or multiple devices), run

aws ec2 authorize-security-group-ingress \
    --group-id my-security-group-id \
    --protocol tcp \
    --port 22 \
    --cidr 0.0.0.0/0

Launch instance

Now we can launch our instance. For the first time, it’s probably wise to launch a small instance with a bare-bones image (an image is just a snapshot of what software is installed). Use the “Launch Instance” interface in the AWS console to explore available instances and images. You can launch your instance from here too of course, and this is prbably easier for the first time. However, we’ll want to be able to use the command line so we can automate this whole process for the future.

As an example, I found a bare-bones Ubuntu Linux image with an image-id of ami-053b0d53c279acc90 in the “Quick Start” section of the online “Launch Instance” interface. I’ll use this image on an instance with one CPU and 1GB of memory called t2.micro. The command is

aws ec2 run-instances \
    --image-id ami-053b0d53c279acc90 \
    --count 1 \
    --instance-type t2.micro \
    --key-name my-key \
    --security-groups my-security-group \

If I wanted to launch a g4dn.xlarge instance, which has four CPUs, 16GB of memory, and a NVIDIA T4 GPU, I would want to use an Ubuntu image that already has the NVIDIA driver installed and configured, such as ami-02e75a29096474262. The command in this case would be

aws ec2 run-instances \
    --image-id ami-02e75a29096474262 \
    --count 1 \
    --instance-type g4dn.xlarge \
    --key-name my-key \
    --security-groups my-security-group \

There are images available with PyTorch or Tensorflow installed if you are a Python user. Take a note of the instance and image id you are using for later reference.

Before connecting, use

aws ec2 describe-instances

to find the PublicDnsName of your instance, and make a note of it.

Connect

Now that our instance is up and running, we can ssh in with

ssh -i path-to-key/my-key.pem ubuntu@public-dns-here

where you replace the filepath to your key and the public dns name with the appropriate values. If you’re not using an Ubuntu image the default username will change.

That’s it! You should be logged into the instance at this point.

Install software

You likely need to install some software for the work you do. I use Julia for my computationally intensive work, and I’ll walk through the installation process as an example.

Download and install Julia by running

wget https://julialang-s3.julialang.org/bin/linux/x64/1.9/julia-1.9.2-linux-x86_64.tar.gz
tar zxvf julia-1.9.2-linux-x86_64.tar.gz

from your instance. Then start Julia with julia-1.9.2/bin/julia. You can also move the binary or add the binary to your $PATH for your convenience.

Once Julia is running, install whatever basic packages you think you’ll frequently need. For example, to use the NVIDIA GPU for computations, we’ll want the CUDA.jl package.

using Pkg; Pkg.add("CUDA")
using CUDA
CUDA.versioninfo()

Save your image

Once we’ve installed our basic software, we’ll want to save a new image containing the current state of our instance. Then we won’t have to repeat the installation process every time we restart it. You can do this through the web interface or through

aws ec2 create-image \
    -- instance-id my-instance-id \
    -- name my-image-name

where my-instance-id is the id of the instance you are running, and my-image-name will be the new name of your image.

Terminate the instance

Once you have logged out of your instance, terminate it with

aws ec2 terminate-instances --instance-ids my-instance-id

which will delete any data not saved when you made the image. Once you run an new instance with the image you just created, you will have access to any data contained in the image.

If you prefer, you can stop the instance instead of terminating it. This will maintain any data on the drive which is not saved in the image, but you will incur storage costs from Amazon. This page currently quotes a price of $0.08/GB-month. For my GPU instance with 100GB of storage, this is a not-unreasonable $8.00/month for a dedicated workstation. Still, I don’t mind terminating my instance after use, and keeping only basic software saved on the image.

Automating the process

This is where the CLI approach really pays off. We’ll write a short shell script to automatically start an instance and connect. This means that in the future, you can connect with just one short command!

I wrote the following script, which you can use with minor edits, and saved it in ~/scripts/start_aws.sh. You can save it anywhere convenient on your machine.

# Start an instance 
function myrs-run() {
    # Run an instance and export the instance id
    id=$(aws ec2 run-instances \
        --image-id my-ami-id \
        --count 1 \
        --instance-type g4dn.xlarge \
        --key-name my-key \
        --security-groups my-security-group \
        --query 'Instances[0].InstanceId' \
        --output text)
    echo "Instance ID: $id, saved to \$id"
    # Save the DNS to facilitate ssh
    dns=$(aws ec2 describe-instances \
        --instance-ids $id \
        --query 'Reservations[0].Instances[0].PublicDnsName' \
        --output text)
    echo "Public DNS: $dns, saved to \$dns"
    # Check that the instance starts properly
    echo "Waiting for instance to be running..."
    while STATE=$(aws ec2 describe-instances \
        --instance-ids $id \
        --output text \
        --query 'Reservations[*].Instances[*].State.Name'); \
        test "$STATE" != "running"; do
        sleep 2;
    done;
    echo "Instance is running."
}

# Simple login
function myrs-ssh() {
    echo "Logging in to instance..." 
    ssh -i path-to-key/my-key.pem ubuntu@$dns
}

# Terminate instances and check for any instances still running
function myrs-term() {
    echo "Terminating instance..."
    aws ec2 terminate-instances --instance-ids $id
    echo "The following instances are still running:"
    aws ec2 describe-instances \
        --filters "Name=instance-state-name,Values=running" \
        --query 'Reservations[*].Instances[*].{ID:InstanceId,Type:InstanceType,State:State.Name}' \
        --output text
}

Then, add the line

source $HOME/scripts/start_aws.sh

to your .zshrc, .bashrc, .bash_profile, or whatever file is used to configure your shell, adjusting the filepath as necessary.

Now, I can run myrs-run (for my remote server) to start my instance, myrs-ssh to login, and myrs-term to terminate the instance when I am finished. I use scp to move any data or scripts I need to my instance at the start of each session, but sometimes I find it easiest to edit code locally in one window and use vim-slime to send chunks to a REPL open on the remote server.