Getting Started with Docker and PyTorch
Docker has become an essential tool for machine learning engineers, providing consistent environments across different machines and making it easy to share and deploy models. In this guide, we'll walk through setting up Docker on macOS, creating a PyTorch container, and running a simple training example.
What is Docker?
Docker is a platform that packages applications and their dependencies into lightweight, portable containers. Think of containers as isolated environments that include everything needed to run your application - code, runtime, libraries, and system tools.
Why Use Docker for Machine Learning?
- Reproducibility: Your code runs the same way on any machine
- Isolation: Dependencies don't conflict with your system or other projects
- Portability: Share your environment with teammates or deploy to cloud services
- Version Control: Track different versions of your ML environment
- GPU Support: Easy access to CUDA and GPU resources
Prerequisites
Before we begin, make sure you have:
- macOS 10.15 or later
- At least 4GB of RAM (8GB+ recommended)
- 20GB of free disk space
- Admin access to your Mac
Step 1: Install Docker Desktop for Mac
Docker Desktop is the easiest way to run Docker on macOS.
Installation Steps
-
Visit the Docker Desktop download page: https://www.docker.com/products/docker-desktop
-
Download Docker Desktop for Mac (choose Apple Silicon or Intel based on your Mac)
-
Open the downloaded
.dmgfile and drag Docker to your Applications folder -
Launch Docker from Applications
-
Follow the setup wizard and grant necessary permissions
Verify Installation
Open Terminal and run:
docker --version
docker run hello-world
You should see the Docker version and a welcome message confirming the installation.
Step 2: Understanding Docker Concepts
Before we dive into PyTorch, let's understand key Docker concepts:
Images vs Containers
- Image: A blueprint or template (like a class in programming)
- Container: A running instance of an image (like an object)
Dockerfile
A text file with instructions to build a Docker image. It defines:
- Base image to start from
- Dependencies to install
- Files to copy
- Commands to run
Docker Hub
A registry where you can find pre-built images, including official PyTorch images.
Step 3: Create a PyTorch Dockerfile
Let's create a custom Docker image with PyTorch and all necessary dependencies.
Create a new directory for your project:
mkdir pytorch-docker-demo
cd pytorch-docker-demo
Create a file named Dockerfile:
FROM python:3.10-slim
WORKDIR /app
RUN apt-get update && apt-get install -y \
build-essential \
curl \
&& rm -rf /var/lib/apt/lists/*
RUN pip install --no-cache-dir \
torch==2.1.0 \
torchvision==0.16.0 \
numpy==1.24.3 \
matplotlib==3.7.1 \
scikit-learn==1.3.0
COPY . /app
CMD ["python", "train.py"]
Dockerfile Breakdown
FROM python:3.10-slim: Start with a lightweight Python 3.10 imageWORKDIR /app: Set working directory inside containerRUN apt-get update: Install system dependenciesRUN pip install: Install Python packagesCOPY . /app: Copy your code into the containerCMD: Default command to run when container starts
Step 4: Create a Training Script
Create a file named train.py with a simple MNIST classifier:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import time
print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
print("Downloading MNIST dataset...")
train_dataset = datasets.MNIST(
root='./data',
train=True,
download=True,
transform=transform
)
test_dataset = datasets.MNIST(
root='./data',
train=False,
download=True,
transform=transform
)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1000, shuffle=False)
class SimpleNN(nn.Module):
def __init__(self):
super(SimpleNN, self).__init__()
self.flatten = nn.Flatten()
self.fc1 = nn.Linear(28 * 28, 128)
self.relu = nn.ReLU()
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.flatten(x)
x = self.fc1(x)
x = self.relu(x)
x = self.fc2(x)
return x
model = SimpleNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
def train_epoch(model, loader, criterion, optimizer, device):
model.train()
total_loss = 0
correct = 0
total = 0
for batch_idx, (data, target) in enumerate(loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
total_loss += loss.item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
total += target.size(0)
if batch_idx % 100 == 0:
print(f"Batch {batch_idx}/{len(loader)}, Loss: {loss.item():.4f}")
avg_loss = total_loss / len(loader)
accuracy = 100. * correct / total
return avg_loss, accuracy
def test(model, loader, criterion, device):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += criterion(output, target).item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(loader)
accuracy = 100. * correct / len(loader.dataset)
return test_loss, accuracy
print("Starting training...")
num_epochs = 3
for epoch in range(num_epochs):
start_time = time.time()
train_loss, train_acc = train_epoch(model, train_loader, criterion, optimizer, device)
test_loss, test_acc = test(model, test_loader, criterion, device)
epoch_time = time.time() - start_time
print(f"\nEpoch {epoch + 1}/{num_epochs}")
print(f"Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.2f}%")
print(f"Test Loss: {test_loss:.4f}, Test Acc: {test_acc:.2f}%")
print(f"Time: {epoch_time:.2f}s\n")
print("Training completed!")
print(f"Final Test Accuracy: {test_acc:.2f}%")
torch.save(model.state_dict(), 'mnist_model.pth')
print("Model saved to mnist_model.pth")
Step 5: Build the Docker Image
Now let's build our Docker image:
docker build -t pytorch-mnist .
This command:
docker build: Builds an image from a Dockerfile-t pytorch-mnist: Tags the image with name "pytorch-mnist".: Uses the current directory as build context
The build process will take a few minutes as it downloads dependencies.
Verify the Image
List your Docker images:
docker images
You should see pytorch-mnist in the list.
Step 6: Run the Container
Run your training inside a Docker container:
docker run --rm pytorch-mnist
Flags explained:
--rm: Automatically remove container when it exitspytorch-mnist: Name of the image to run
Run with Volume Mounting
To persist data and models outside the container:
docker run --rm -v $(pwd)/data:/app/data -v $(pwd)/models:/app/models pytorch-mnist
This mounts local directories into the container, so downloaded data and saved models persist.
Step 7: Interactive Development
For development, you might want to run the container interactively:
docker run -it --rm -v $(pwd):/app pytorch-mnist bash
This opens a bash shell inside the container where you can:
- Run Python scripts manually
- Install additional packages
- Debug issues
- Explore the environment
Inside the container, you can run:
python train.py
Advanced: Docker Compose
For more complex setups, use Docker Compose. Create docker-compose.yml:
version: '3.8'
services:
pytorch-training:
build: .
volumes:
- ./data:/app/data
- ./models:/app/models
- ./logs:/app/logs
environment:
- PYTHONUNBUFFERED=1
command: python train.py
Run with:
docker-compose up
Common Docker Commands
Here are essential Docker commands you'll use frequently:
docker ps
docker ps -a
docker images
docker rm <container-id>
docker rmi <image-id>
docker logs <container-id>
docker exec -it <container-id> bash
docker stop <container-id>
docker system prune
Troubleshooting
Container Exits Immediately
Check logs:
docker logs <container-id>
Out of Memory
Increase Docker Desktop memory allocation:
- Open Docker Desktop
- Go to Settings → Resources
- Increase Memory limit to 8GB or more
Slow Performance on Mac
Docker on Mac uses virtualization, which can be slower than native Linux. Consider:
- Using Docker volumes instead of bind mounts
- Increasing allocated resources
- Using
.dockerignoreto exclude unnecessary files
Permission Issues
If you encounter permission errors, ensure your user has Docker permissions:
sudo usermod -aG docker $USER
Then log out and back in.
Best Practices
1. Use .dockerignore
Create a .dockerignore file to exclude unnecessary files:
__pycache__
*.pyc
*.pyo
*.pyd
.Python
env/
venv/
.git
.gitignore
.DS_Store
*.ipynb_checkpoints
data/
models/
logs/
2. Multi-stage Builds
For production, use multi-stage builds to reduce image size:
FROM python:3.10-slim as builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt
FROM python:3.10-slim
WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY . .
ENV PATH=/root/.local/bin:$PATH
CMD ["python", "train.py"]
3. Pin Package Versions
Always specify exact versions in your Dockerfile to ensure reproducibility.
4. Layer Caching
Order Dockerfile commands from least to most frequently changing to maximize cache usage.
5. Security
- Don't run containers as root in production
- Scan images for vulnerabilities
- Use official base images
- Keep images updated
Next Steps
Now that you have a working Docker + PyTorch setup, you can:
- Add GPU Support: Use NVIDIA Docker runtime for GPU acceleration
- Deploy to Cloud: Push images to Docker Hub or AWS ECR
- Orchestration: Learn Kubernetes for managing multiple containers
- CI/CD: Integrate Docker into your continuous integration pipeline
- Experiment Tracking: Add MLflow or Weights & Biases to your container
Conclusion
Docker provides a powerful way to containerize your machine learning workflows, ensuring consistency across development, testing, and production environments. By following this guide, you've learned how to:
- Install Docker Desktop on macOS
- Create a custom PyTorch Docker image
- Run training inside a container
- Use volumes for data persistence
- Apply best practices for Docker in ML
The example MNIST classifier demonstrates the basics, but the same principles apply to complex deep learning projects. Start containerizing your ML workflows today and enjoy the benefits of reproducible, portable environments!
Resources
Happy containerizing! 🐳🔥