Creating a custom Deep Learning container for AWS Sagemaker training

Santhosh Puttaraju
3 min readDec 2, 2020

Amazon SageMaker provides a suite of built-in algorithms to help data scientists and machine learning practitioners get started on training and deploying machine learning models quickly. For most use cases, you can use the built-in algorithms and frameworks without worrying about containers. Alternatively, you can use the built-in algorithms and frameworks using Docker containers. SageMaker provides containers for its built-in algorithms and prebuilt Docker images for some of the most common machine learning frameworks, such as Apache MXNet, TensorFlow, PyTorch, and Chainer.

But there are instances where you want to add specific python libraries not present in the base container image. And for this, SageMaker allows us to extend a prebuilt SageMaker algorithm.

Here, we’ll try to extend a PyTorch container to install the python-snowflake-connector and other libraries. The example below shows how to extend one of our containers to build your own custom container for PyTorch.

Please note that in addition to the normal SageMakerFullAccess permissions, you would need permission to create a new repository in ECR. For this you can simply add the managed policy AmazonEC2ContainerRegistryFullAccess to the role that you used to start your notebook instance.

  1. Open your SageMaker Notebook Instance and open a Terminal window and cd to /Sagemaker.

2. Get a base PyTorch docker image from this repo: https://github.com/aws/sagemaker-pytorch-training-toolkit

Clone this repo into your sageMaker instance.

3. On the sagemaker, navigate to the sagemaker-pytorch-training-toolkit directory and create a requirements.txt file

And add all the libraries that you would want for the model training.

4. Navigate to the docker file in the cloned repo directory.

5. Open this docker file and add the following lines (boxed in red). The environment variables for setting proxies are required if you are using a firewalled AWS account (like your office account). The masked part is the username/password that would give access to the proxy server.

6. Run the docker build command. Again, proxies need to be set/passed only if required in your environment.

7. On AWS console navigate to ECR and create a repository.

8. Back in the terminal login to Docker using the aws command cli.

9. Now that the docker image has been built locally, tag using the image id and then push it to ECR with the full name.

10. You can validate by running the container using the image id and trying to import one of the custom libraries.

Please leave your questions and comments below.

--

--