Skip to main content

Smaller Docker Images Using Multi-Stage Builds

Gregory Schier Headshot Gregory Schier • 5 min read

Docker 17 introduced a new feature called multi-stage builds, which greatly simplifies the process of optimizing Docker images. This post gives an overview of multi-stage builds and how to use them to simplify Dockerfiles and greatly reduce image size. 🥳

Docker multi-stage builds

Here's what it looks like:

Dockerfile
FROM golang:1.14-alpine as compile
ADD . .
RUN go build -o myapp .

# Copy artifacts to minimal runtime image
FROM alpine:3.11
COPY --from=compile /app/myapp ./myapp
CMD ["./myapp"]

Compilation vs. Runtime#

The simplest way to cut down the size of Docker images is to separate dependencies into two categories—compilation and runtime—then remove the compile-time dependencies after use.

Compilation Dependencies: required to compile the project
Runtime Dependencies: required to run the compiled executable

Dependency cleanup is, unfortunately, very difficult and quickly becomes a maintenance nightmare. Multi-stage builds eliminate the need to cleanup dependencies altogether.

Multi-Stage Builds#

Docker's multi-stage builds get around dependency cleanup by providing the ability to use separate images for compilation and run-time. This means that we can simply copy the necessary build artifacts from the compilation image (very large) to a different (very small) runtime image.

Go Example#

Here's an example of a multi-stage Dockerfile for a simple Go program.

Dockerfile
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Compilation Image (~500 MB)
#
FROM golang:1.14-alpine as backend

# Add source code
ADD . .

# Do the compilation
RUN go build -o myapp .

# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Runtime Image (~30 MB)
#
FROM alpine:3.11 # Very minimal image

# Copy compiled binary from previous image
COPY --from=backend /app/myapp ./myapp

CMD ["./myapp"]

For reference, the alpine Docker image is 6MB and the golang:alpine image is 300MB. That's 294MB of space saved just by starting from a different base image – no dependency management required! 🤯

For reference, here is the unoptimized version (worst-case scenario).

Dockerfile (unoptimized)
Expand all 8 Lines
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Compilation AND Runtime Image (490MB)
#
FROM golang:1.14-alpine
WORKDIR /app
ADD . .
RUN go build -o myapp .
CMD ["./myapp"]

Using multi-stage builds in the Go example above dropped our image size from 490MB to just 26MB! Before you get excited, it's important to remember that we could have achieved similar results by carefully managing and cleaning dependencies, but to be honest I wouldn't even know how to do that without hours of unnecessary research. Multi-stage builds are a much better solution.

Frontend/Backend Example#

Multi-stage builds don't have to be linear. Most web apps, for example, need to compile multiple components, with the most obvious split being between frontend and backend. To achieve this, we can create a compilation image for each, then copy assets from both places into the final runtime image.

Backend and Frontend Setup

Here's a real-world example from my personal blog that uses a node base image to build the frontend assets and golang base image to build the backend server. From there, the runtime image copies the compiled artifacts into itself.

Dockerfile
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Frontend Image 
#
FROM node:12-alpine AS frontend
ADD . ./app
WORKDIR /app/frontend
RUN npm install && npm run build

# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Backend Image
#
FROM golang:1.14-alpine AS backend
WORKDIR /app
ADD . .
RUN go install ./...

# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Runtime Image
#
FROM alpine:3.11
COPY --from=frontend /app/frontend/static ./frontend/static
COPY --from=backend /go/bin/web ./web
CMD ["./web"]

The resulting Dockerfile is clean, easy to understand, and trivial to maintain.

Python, and Interpreted Languages#

The above example used Go because it has a clear separation between runtime dependencies, but what about more complicated situations? Python, for example, requires the Python interpreter for execution. Python packages also often depend on C libraries for performance-critical tasks like database interaction and image manipulation. The strategy for Python is the same as Go, in general, but is complicated by the fact that Python apps have many more runtime dependencies to consider.

Here's what a multi-stage build might look like for a Python Django web project.

Dockerfile
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Compilation Image
#
FROM python:3.8-alpine AS compile

# Install dependencies required to compile Python packages
RUN apk add --no-cache postgresql-dev libffi-dev musl-dev gcc

# Create Python virtualenv and use it
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

# Add requirements file
ADD requirements.txt .

# Install requirements (save space by not caching)
RUN pip install -r requirements.txt --no-cache-dir

# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Runtime Image
#
# Note: Python base is unfortunately large, but required
FROM python:3.8-alpine 

# Install required runtime dependencies
RUN apk add --no-cache postgresql-libs

# Copy code into image
# TIP: use .dockerignore to exclude large items or secrets
COPY . .

# Copy dependencies from virtualenv and use it
COPY --from=compile /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

# Define command to run Django app
CMD daphne --port 8000 --bind 0.0.0.0 project.asgi:application

You'll notice that separating compilation and runtime dependencies for Python apps is more complicated than Go. The size of the runtime image a around 200MB, where our Go image was ~20MB. This is an unfortunate side effect of interpreted languages needing more dependencies, and can't really be avoided. However, it's still better than using an unoptimized Python image, which would be upwards of 500MB.

BONUS: Use .dockerignore

Docker's .dockerignore should be used in conjunction with multi-stage builds. When adding files to an image using the ADD or COPY command, .dockerignore can be used to exclude unnecessary folders or files containing sensitive secrets. Things like cache directories, Git data, and documentation take up lots of space and should be ignored. On top of that, files containing secrets or sensitive information should be ignored so they won't be distributed inside the image (especially if it's publicly accessible).

Wrap-Up#

That's really all there is to know about Docker's multi-stage builds. It's one of my favorite features of modern Docker and I still can't believe how much it makes optimization compared to the bad-old-days.


Awesome, thanks for the feedback! 🤗

How did you like the article?