In the previous post, we managed to build the very minimal Dockerfile for a dotnet core application. It’s really simple and it works, so one might argue that it’s “good enough” already. However, there is a lot to be gained by working just a little bit more on this file and this investment will surely pay off very soon.

For example, by having well crafted Dockerfile and taking advantage of caching, you might avoid having to restore packages on each docker build. Instead, we may construct it in a way, where they will be restored only if csproj changes. The same could be done of course for things like npm install, where it provides even greater benefit!

Also, if we reduce the image size by taking advantage of multi-staged builds, we’ll speed up container start and make them much easier to move around. Not to mention that your SSD will be thankful, as it doesn’t need to store unnecessary thrash anymore.

What’s really inside our image?

There’s one really cool command in Docker, which allows you to execute stuff in the container context, just like you’d open SSH session with a “container”. Meaning, you can e.g. traverse filesystem and see exactly the same things as the containerized process does. Under the hood, it launches the command given within the container namespace. The command we are talking about is named docker exec and, in fact, you had the pleasure to meet it before in this series.

To find out, let’s start out a container again from our image, this time with -d flag meaning “detach”. That’s just not to block your terminal. Please note, that this assumes you already have hello_asp_net_core:latest image built. If not, take a look at my previous post

$ docker run -d -p 5000:80 --name tutorial hello_asp_net_core:latest

Now, let’s “get into” the container (switch namespace) using a handy docker command and execute ls -lah there to see what files we have there.

$ docker exec -it tutorial ls -lah

Ouch… see that? There are our application sources just lying there. We don’t want them here, not only because it’s totally not needed, or because it makes to container image sparse but mainly because it’s not a good practice from security standpoint.

Also, if we fire:

$ docker images | grep hello_asp_net_core
hello_asp_net_core                  latest              ac62907deefe        About an hour ago   1.78GB

You’ll see our container image weights almost 2 gigs! Why is it so, if there is only a tiny webserver binary inside?

That’s because we packed our image along with the whole dotnet SDK. Meaning, we ship compiler and all other SDK stuff along with our image. We need to fix that, as having the image sparse makes the container longer to start and much harder to transfer over the network.

Hopefully, there is another image created by Microsoft called runtime. It contains only the neccessary binaries to run dotnet applications, therefore is much lighter. In fact, the difference is huge: at the time of writing it’s 1.74Gb for the SDK image and only 181Mb for the runtime!

Fixing the Dockerfile

Okay, we’d like to achieve two things now: copy only the binaries and have only the runtime shipped with our image. However, we cannot simply switch the image to runtime as it’ll make building our application impossibile. After all, we need the compiler when we build. How can we have the full SDK for build, but leave only the runtime in the image ?

At the time of writing, the easiest way to achieve it is to take advantage of multi staged builds in Docker. How it works, is that you can define temporary containers to be run as part of your docker build, and then copy only whathever you need to the output image. So, the plan is that we’ll have a temporary container with full SDK image, do the build there and then copy only the neccessary binaries to the final image. This final image is also going to use runtime image as it’s base image. Let’s take a look how can we translate that into Dockerfile:

FROM mcr.microsoft.com/dotnet/core/sdk:2.2 as temporary-build
WORKDIR /app
COPY . ./

RUN dotnet publish -c Release -o out

FROM mcr.microsoft.com/dotnet/core/runtime:2.2 as final
WORKDIR /app
COPY --from=temporary-build /app/out ./

ENTRYPOINT ["dotnet", "HelloAspNetCore.dll"]

Note, that the last stage in a multi stage build is going to be your final image. Rest of the stages will be thrown away after build. Anyway, let’s bake this and check the image size again:

$ docker build -t hello_asp_net_core:latest .
$ docker images | grep hello_asp_net_core
hello_asp_net_core                      latest              a9386f3ca45e        2 minutes ago       198MB

We just went down from 1.87Gb to 198Mb. Good!

Docker caching

Our docker image at this point is really close to perfection. We slimmed it down by a huge factor, by removing all unnecessary stuff from it. Yet, there is one more thing we can do to improve it even more. That is, we can optimize the Dockerfile to take advantage of layer caching.

In our example, we’ll make use of that for restoring nuget packages, so that if there are no new packages added it’ll retrieve the whole layer from cache instead of doing nuget restore on each build. Smart usage of caching can drastically improve your build times, which has even greater impact on CI servers.

But, how does this cache actually work ?

Remember how we talked about layers and how they are bulild ? We’ve learnt that e.g. each COPY creates new layer and that layers are incremental, forming kind of linked list. This is exactly what makes caching possibile! To see how it works for yourself, just fire:

$ docker build -t hello_asp_net_core:latest .
Sending build context to Docker daemon  14.85kB
Step 1/8 : FROM mcr.microsoft.com/dotnet/core/sdk:2.2 as temporary-build
 ---> 08657316a4cd
Step 2/8 : WORKDIR /app
 ---> Using cache
 ---> d2d21b4fbc19
Step 3/8 : COPY . ./
 ---> Using cache
 ---> aa58ede05801
Step 4/8 : RUN dotnet publish -c Release -o out
 ---> Using cache
 ---> d81f0cb17198
Step 5/8 : FROM mcr.microsoft.com/dotnet/core/runtime:2.2 as final
 ---> b938a3374c58
Step 6/8 : WORKDIR /app
 ---> Using cache
 ---> 4b7fd6413523
Step 7/8 : COPY --from=temporary-build /app/out ./
 ---> Using cache
 ---> 12d52daf2448
Step 8/8 : ENTRYPOINT ["dotnet", "HelloAspNetCore.dll"]
 ---> Using cache
 ---> 0b3474d1f89a
Successfully built 0b3474d1f89a
Successfully tagged hello_asp_net_core:latest

If you followed this series and already built this image at least once, you’ll see something similar to what’s visible above. That is, not even a single layer was build, but instead all of them were taken from cache. Because of this, the build completed almost instantaneously! Now, let’s try to break the cache by executing:

$ touch test
$ docker build -t hello_asp_net_core:latest .
Sending build context to Docker daemon  15.36kB
Step 1/8 : FROM mcr.microsoft.com/dotnet/core/sdk:2.2 as temporary-build
 ---> 08657316a4cd
Step 2/8 : WORKDIR /app
 ---> Using cache
 ---> d2d21b4fbc19
Step 3/8 : COPY . ./
 ---> babbf87b3d09
Step 4/8 : RUN dotnet publish -c Release -o out
 ---> Running in 7512a36b4bb8
Microsoft (R) Build Engine version 16.2.32702+c4012a063 for .NET Core
Copyright (C) Microsoft Corporation. All rights reserved.

  Restore completed in 522.69 ms for /app/HelloAspNetCore.csproj.
  HelloAspNetCore -> /app/bin/Release/netcoreapp2.2/HelloAspNetCore.dll
  HelloAspNetCore -> /app/out/
Removing intermediate container 7512a36b4bb8
 ---> b2afcefef518
Step 5/8 : FROM mcr.microsoft.com/dotnet/core/runtime:2.2 as final
 ---> b938a3374c58
Step 6/8 : WORKDIR /app
 ---> Using cache
 ---> 4b7fd6413523
Step 7/8 : COPY --from=temporary-build /app/out ./
 ---> Using cache
 ---> 12d52daf2448
Step 8/8 : ENTRYPOINT ["dotnet", "HelloAspNetCore.dll"]
 ---> Using cache
 ---> 0b3474d1f89a
Successfully built 0b3474d1f89a
Successfully tagged hello_asp_net_core:latest

See what happened there ? The publish step was executed again, even though we only added totally unrelated file! That’s because the file we added, although unrelated, invalidated the cache starting from COPY . ./ line. The next layer needed to be rebuild from scratch as well, since the input changed.

However, if we examine the console output closely we can see that the follow up layers of dotnet publish used cache again. Can you guess why ?

That’s because adding test file didn’t affect the output of compilation at all. Exactly the same binaries got produced on the output, as when the file wasn’t there. Because of that, docker saw that it has such layer with binaries already cached and used that. Clever, isn’t it ?

Having learned the new concept, let’s adjust our Dockerfile now:

FROM mcr.microsoft.com/dotnet/core/sdk:2.2 as temporary-build
WORKDIR /app

COPY HelloAspNetCore.csproj ./
RUN dotnet restore

COPY . ./
RUN dotnet publish -c Release -o out

FROM mcr.microsoft.com/dotnet/core/runtime:2.2 as final
WORKDIR /app
COPY --from=temporary-build /app/out ./

ENTRYPOINT ["dotnet", "HelloAspNetCore.dll"]

Have you noticed what makes the difference there ? We first copy the csproj itself and only then, after doing restore, we move on with copying everything else and building the sources. Becuase of this little trick, if csproj doesn’t change then the layer doesn’t change as well. Next, in the RUN dotnet publish step the cache match is performed purely based on the command string. Since it didn’t change, it’ll be retrieved from cache as well.

You can play with it a bit, by changing source files and you’ll see that the dotnet restore is not being performed again! Neat!

Tip: if you have some files which are not part of the docker build, e.g. docs lying out in the repo you might exclude them from the build context with a file called .dockerignore. See the docs for more details.

Summary

This time we’ve learnt how to create production grade Dockerfiles. We removed all the unnecessary files from the image, thus improving security of your containers and enabling more agility. Since the image is much smaller, it can be moved around much easier and started much faster!

Besides, we gave the builds itself a big boost by wisely constructing the statements in Dockerfile. Even the monstrous npm install can be made tolerable now, when combined with docker cache!

In the next post, we’ll move on to container orchestration conept, sticking with Kubernetes as the orchestrator. This will allow you to run your containers on production, in a very managable way. Stay tuned!