On This Page
Docker 17 introduced a new feature called multi-stage builds, which greatly simplifies the process of optimizing Docker images. This post gives an overview of multi-stage builds and how to use them to simplify Dockerfiles and greatly reduce image size. 🥳
Here's what it looks like:
FROM golang:1.14-alpine as compile ADD . . RUN go build -o myapp . # Copy artifacts to minimal runtime image FROM alpine:3.11 COPY --from=compile /app/myapp ./myapp CMD ["./myapp"]
Compilation vs. Runtime#
The simplest way to cut down the size of Docker images is to separate dependencies into two categories—compilation and runtime—then remove the compile-time dependencies after use.
Compilation Dependencies: required to compile the project
Runtime Dependencies: required to run the compiled executable
Dependency cleanup is, unfortunately, very difficult and quickly becomes a maintenance nightmare. Multi-stage builds eliminate the need to cleanup dependencies altogether. ✨
Docker's multi-stage builds get around dependency cleanup by providing the ability to use separate images for compilation and run-time. This means that we can simply copy the necessary build artifacts from the compilation image (very large) to a different (very small) runtime image.
Here's an example of a multi-stage Dockerfile for a simple Go program.
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Compilation Image (~500 MB) # FROM golang:1.14-alpine as backend # Add source code ADD . . # Do the compilation RUN go build -o myapp . # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Runtime Image (~30 MB) # FROM alpine:3.11 # Very minimal image # Copy compiled binary from previous image COPY --from=backend /app/myapp ./myapp CMD ["./myapp"]
For reference, the
alpine Docker image is 6MB and the
golang:alpine image is 300MB. That's 294MB of space saved just by starting from a different base image – no dependency management required! 🤯
For reference, here is the unoptimized version (worst-case scenario).
Expand all 8 Lines
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Compilation AND Runtime Image (490MB) # FROM golang:1.14-alpine WORKDIR /app ADD . . RUN go build -o myapp . CMD ["./myapp"]
Using multi-stage builds in the Go example above dropped our image size from 490MB to just 26MB! Before you get excited, it's important to remember that we could have achieved similar results by carefully managing and cleaning dependencies, but to be honest I wouldn't even know how to do that without hours of unnecessary research. Multi-stage builds are a much better solution.
Multi-stage builds don't have to be linear. Most web apps, for example, need to compile multiple components, with the most obvious split being between frontend and backend. To achieve this, we can create a compilation image for each, then copy assets from both places into the final runtime image.
Here's a real-world example from my personal blog that uses a
node base image to build the frontend assets and
golang base image to build the backend server. From there, the runtime image copies the compiled artifacts into itself.
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Frontend Image # FROM node:12-alpine AS frontend ADD . ./app WORKDIR /app/frontend RUN npm install && npm run build # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Backend Image # FROM golang:1.14-alpine AS backend WORKDIR /app ADD . . RUN go install ./... # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Runtime Image # FROM alpine:3.11 COPY --from=frontend /app/frontend/static ./frontend/static COPY --from=backend /go/bin/web ./web CMD ["./web"]
The resulting Dockerfile is clean, easy to understand, and trivial to maintain.
Python, and Interpreted Languages#
The above example used Go because it has a clear separation between runtime dependencies, but what about more complicated situations? Python, for example, requires the Python interpreter for execution. Python packages also often depend on C libraries for performance-critical tasks like database interaction and image manipulation. The strategy for Python is the same as Go, in general, but is complicated by the fact that Python apps have many more runtime dependencies to consider.
Here's what a multi-stage build might look like for a Python Django web project.
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Compilation Image # FROM python:3.8-alpine AS compile # Install dependencies required to compile Python packages RUN apk add --no-cache postgresql-dev libffi-dev musl-dev gcc # Create Python virtualenv and use it RUN python -m venv /opt/venv ENV PATH="/opt/venv/bin:$PATH" # Add requirements file ADD requirements.txt . # Install requirements (save space by not caching) RUN pip install -r requirements.txt --no-cache-dir # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Runtime Image # # Note: Python base is unfortunately large, but required FROM python:3.8-alpine # Install required runtime dependencies RUN apk add --no-cache postgresql-libs # Copy code into image # TIP: use .dockerignore to exclude large items or secrets COPY . . # Copy dependencies from virtualenv and use it COPY --from=compile /opt/venv /opt/venv ENV PATH="/opt/venv/bin:$PATH" # Define command to run Django app CMD daphne --port 8000 --bind 0.0.0.0 project.asgi:application
You'll notice that separating compilation and runtime dependencies for Python apps is more complicated than Go. The size of the runtime image a around 200MB, where our Go image was ~20MB. This is an unfortunate side effect of interpreted languages needing more dependencies, and can't really be avoided. However, it's still better than using an unoptimized Python image, which would be upwards of 500MB.
.dockerignore should be used in conjunction with multi-stage builds. When adding files to an image using the
.dockerignore can be used to exclude unnecessary folders or files containing sensitive secrets. Things like cache directories, Git data, and documentation take up lots of space and should be ignored. On top of that, files containing secrets or sensitive information should be ignored so they won't be distributed inside the image (especially if it's publicly accessible).
That's really all there is to know about Docker's multi-stage builds. It's one of my favorite features of modern Docker and I still can't believe how much it makes optimization compared to the bad-old-days.