Machine Learning Docker Template

This post was originally published on my website adamnovotny.com

Summary

  1. All data scientists can quickly setup an identical development environment based on Docker that encourages good software engineering practices.
  2. Dependency management is handled during the environment’s startup by Miniconda and requires minimal manual changes.
  3. Notebooks are encouraged for exploration. However, for production purposes notebooks must be version controlled, parametrized and run using Papermill.

Code

File structure
  1. Dockerfile defines the development environment and uses Miniconda as base image
FROM continuumio/miniconda3
...
RUN conda env create -f conda.yml
RUN echo "source activate dev" > ~/.bashrc

2. conda.yaml is used for dependency management and includes standard data science packages.

3. ml_docker_template package should include all production code that can be installed and run by an external system. As a result, the code can be developed locally but also easily runs on an external machine when additional compute power is needed for model training or when additional permissions are required for deployment.