This is the fifth part of an ongoing series on configuring a new server. In our last post we discussed and configured some basic tools. For all intensive purposes, our 'root' system is complete. What we'll be doing now is building the infrastructure to run containers the way we want to.
Before we get around to setting things up, let's describe the what we're up to.
Currently, the only external service the server is running is
ssh. We'd like our server to host more services. These services are likely to be both internal services like databases, and external services like HTTP/HTTPS hosts. Since the IPv4 space is increasingly crowded we'd like to do all of these things from one IPv4 address too.
There are lots of ways to provide isolation to these services. These include virtual machines through something like Xen, containers through something like Docker, or even just chroots. Isolation isn't just for security, it's one way to help increase your security somewhat but it isn't a silver bullet. Light solutions like containers have very little overhead though so the benefits tend to outweigh the costs.
We'll use containers, because they're entertaining and fun to use. This is a laboratory, remember? We're suposed to have fun and play with new things.
Our First Container
We'll be using
machinectl to work with our containers. You may already know Docker, CoreOS's
lxc, and you're more than welcome to use any of those. The important thing is to only use one on this machine. You don't want to be managing multiple container providers. Good news is they're all pretty much compatible with one another, so if you use
machinectl you can still use Docker images.
To get started with our first container, we first need to construct one. We'll make a scratch directory and put it in there. We'll tell
pacstrap, which came as part of
arch-install-scripts, to ignore anything that's likely unnecessary in a container.
This should give you an installed package size of under 400MB. That's pretty decent. We could probably go smaller but it doesn't really matter. We don't need to worry about setting up things like
iptables since it's already protected via the host, and we don't need to have things like
vi because we'll do our editing from outside of the container. We don't need things like
linux because we're using the host kernel.
Take a look in
~/scratch and you should see what appears to be a base Arch install like we had before. We're going to use
chroot to enable some services by default, finally we're going to import it into
After a second, this command will be done and we'll have our first image.
machinectl list-images shows this:
NAME TYPE RO USAGE CREATED MODIFIED base subvolume yes 277.6M Wed 2016-05-11 15:52:43 PDT n/a 1 images listed.
We can test out the image with
machinectl start base. If you get an "Operation not supported" error here it's because you updated your kernel and haven't restarted. Later we'll talk about
kexec to help with this problem.
You can get a shell into the container with
machinectl shell base. You'll probably notice there is no network connection. That's because our firewall is blocking us! You can check this by observing
iptables -nvL and watching the packet counts go up on the
/etc/iptables/iptables.rules we can add the green lines:
# Accept anything from the local loopback. --append INPUT --in-interface lo --jump ACCEPT +# Accept anything originating from a container. +--append INPUT --in-interface ve-+ --jump ACCEPT +--append FORWARD --in-interface ve-+ --jump ACCEPT
This rule will make it so any communications coming from our container adapters will be permitted accepted. Now reload the firewall with
systemctl restart iptables.
Returning to your container with
machinectl reboot base then
machinectl shell base, from here you should be able to run
ping hoverbear.org and
ping silicon if
silicon is your machine's hostname like mine.
You can check over your work by running
btrfs subvolume list / and verifying that there is a
var/lib/machines/base subvolume, then running
machinectl list-images --all and verifying there is a
From here you can create writable clones of the image with
machinectl clone base $NAME, then
shell into them, and ultimately
poweroff them. When you're ready to get rid of them entirely you can
Later on you can upgrade your
base image in a similar way.
After a moment to start you can issue the upgrade command.
After another moment you can finish up.
At this point you might have identified a potential problem coming up. How do we create and maintain all these gosh darn containers? Well, that's a great question! We're going to try using Puppet!
This is a very different strategy compared to that taken by Docker. Docker uses declarative
Dockerfiles to build images, then you upgrade by either replacing the current running container, or executing commands via the
run subcommand. With
puppet we'll be able to easily execute pre-defined provisioning scripts and maintain our running containers.
Also, I want to learn Puppet, so this is a great excuse. You can review the Puppet docs as well as this guide by Digital Ocean. Our plan is a bit different than those.
Let's start by creating a
puppet container from our base.
Since we're basing our
puppet image on our
base Arch image we can have them share repository caches. Neat! Next we'll create a
environments subvolume on our BTRFS volume so we can mount it into the
puppet image. This will help us not accidently delete things. We're 'in' a subvolume already so we need to actually mount the true root for this.
/etc/systemd/nspawn/puppet.nspawn which is the container specific configuration.
See anti-feature note in next section.
Now we can start the machine and hop inside.
Inside of this container we can start seting up Puppet! First we'll install it:
Before we start making manifests we need to get the server working though! In order to do this we need to edit our configuration file at
# In puppet container
Next we'll make a simple
.service file for
puppet master since it doesn't come with one by default. You can run
systemctl cat puppet to see the configuration of
puppet agent. We'll basically just copy this.
Now you can start it with
systemctl start puppetmaster. At this point let's consider that our
puppet container should probably be managed by Puppet as well! Let's do that to learn how to use Puppet, then we'll figure out networking.
Now you can run
systemctl start puppet (this is the agent) and you might see that
/puppet-test now exists. If not you might either need to wait a moment or restart
puppet. This is due to the syncing behaivor which we can tame later.
systemctl enable puppet and
systemctl enable puppetmaster. Remember that you can edit this configuration from outside the container by editing the things in
/puppet on the our root.
All we need to do now is add Puppet to our
base. First make it writable with
machinectl read-only base false, start it, then shell in with
machinectl shell base.
poweroff it and mark it read only again with
machinectl read-only base true. Now when we clone it the new containers will automatically run puppet.
Fixing an Anti-Feature
A note, if you
machinectl remove $MACHINE systemd will, in rather user hostile manner, delete your
/etc/systemd/nspawn/$MACHINE.nspawn file, with no output, no warning, or flag to disable this anti-feature.
So let's fix it! In fact, we can make our
/etc/systemd/nspawn folders both subvolumes for safe keeping. With
config/puppetlabs/ also as subvolume our primary container configuration will be stored safely in subvolumes.
btrfs subvolume list / you may notice that our machines have been creating subvolumes as they go. This is great. We can use these for snapshots later. We'll create a
config/nspawn. Recall that we're 'in' a subvolume already so we need to actually mount the true root for this again.
# Remount the folders. && &&
touch /etc/systemd/nspawn/web.puppet, you can't! Instead, you can
Right now if you start up
base and try to
ping puppet from it things it won't work. You can observe the differences in their IP addresses. Mine were
169.254.221.43/16 which are on different subnets. We need some what to bring them together.
I tried a few ways of doing this (actually spent 6 days trying different ideas) and ended up on this because it's the simplest and seems easiest to understand.
First we'll go ahead and create a bridge. A bridge will let us join all the containers together on a single network so they can resolve each others names. For now this is sufficient, though there is some documentation suggestion VLANs can be used to improve this system. Perhaps we'll explore that later.
On our 'root' host:
systemctl edit systemd-nspawn@ and enter the following:
# Set up. ExecStartPre=/bin/sh ExecStartPre=/bin/sh ExecStartPre=/bin/sh # Do work ExecStart= ExecStart=/bin/sh # Tear down. ExecStopPost=-/bin/sh
machinectl clone base test
machinectl poweroff test puppet,
machinectl start test puppet, then try running
machinectl shell test and see if you can
ping puppet. Finally make sure you can
ping hoverbear.org. If all is well then we've had great success.
At this point if you start up both the
test container you should find a
/puppet-test file on both. You can verify things with
systemctl status puppet.
The Upgrade Problem
Earlier you made have already run into this problem, but if we run
pacman -Syu and the
linux package on the 'root' container gets upgraded this can cause some problems for us in the long haul. This may manifest as "Operation Not Permitted" errors when using tools like
ip, this is because the kernel is trying to access modules on the file system which no longer exist.
We can use a tool called
kexec to quickly (and uncleanly) switch to the new kernel. On my server this takes about a third of the time compared to actually rebooting. It's important to note that this doesn't replace a reboot though, as it doesn't make efforts to clean up anything. Things may go funky, and you may still need to reboot.
We'll build a systemd unit for this then enable it like so:
Now you can load into the newest kernel with
systemctl kexec when you decide it's time. I'd suggest testing it now before moving on so you know it works! This also works great as a soft reboot.
The Web Proxy
Since we know we'd like to host be able to host multiple web services we'll need to set up a proxy to handle port 80, the default socket for
httpd. Since only one program can bind to a given port at a time, its job will be to inspect the incoming traffic to observe the Server Name Indentification, or SNI, then route it to the appropriate container. By default this will be the
web-$VARIABLE container, where the request is
$VARIABLE.$DOMAIN.$TLD. We'll call the proxy itself
Later we'll follow a similar scheme for
box, etc. You can use whichever scheme you'd like, or change later, just be consistent.
We'll start by editing our puppet configuration to define what this host should look like. Since we have the puppet configuration subvolume mounted on
/puppet on our host we can just edit it from there.
We can set up the container to bind to our host ports 80 and 443.
Then add in our firewall rules to the
/etc/iptables/iptables.rules file, and reboot
# Allow SSH connections. --append TCP --protocol tcp --dport 22 --jump ACCEPT # Allow HTTP connections. +--append TCP --protocol tcp --dport 80 --jump ACCEPT # Allow HTTPS connections. +--append TCP --protocol tcp --dport 443 --jump ACCEPT
Finally we can create the machine:
Give it a few seconds then try running
systemctl status haproxy -M web.
puppet doesn't wait for the name resolution to configure and needs to be restarted with
systemctl restart puppet -M web. I'm currently pondering solutions to this.
Rest and Test
Since the hostname is
silicon.hoverbear.org is already mapped in DNS for me, I'll use that for testing. You want to make your container
Since we're testing we don't need to be clean at all. Just shell in,
pacman -Syu nginx openssl and edit the
/etc/nginx/nginx.conf on that container according to this diff:
-listen 80; +listen [::]:443 ssl; +ssl_certificate /etc/nginx/ssl/nginx.crt; +ssl_certificate_key /etc/nginx/ssl/nginx.key;
Then generate the certificate with:
Then start up the
nginx service on that container. Verify you can
curl web-silicon from both the container itself and your
web container. You can check
systemctl status -M web-silicon nginx and
systemctl status -M web haproxy for any messages.
With any luck, you'll see the
nginx default web page.
If all is good you can make these machines default to running with
machinectl enable web puppet. In our next post we'll get around to setting more refined web service. Remember,
web-silicon was just a hello world.
Over time our system will inevitably change, grow, and decay. We've been spending time to make sure that whatever happens we'll have the tools to keep going. We haven't gone out of our way to over-engineer anything so far, we've just made what we need with a few sharp tools.
As we keep moving ahead we'll discover more needs and eventually build up our services to be better and stronger as a whole, but for now we have the basis to start our work.
nspawn containers are persistent, but we can remove them whenever we want and re-create them with a new configuration. Our Puppet master will re-configure new images the same as old ones, and what we extend out with that can be retroactively applied to all. The key is for us to maintain a persisting copy of the data necessary in a safe space. BTRFS and the
Bind= option of
nspawn enable this.
In our next post we'll explore how to deploy a basic folder based
nginx host and the RustFestEU Call for Papers site. Then, it'll go live! Whoa! But first, let's take a break. That post was so long!