This is the fifth part of an ongoing series on configuring a new server. In our last post we discussed and configured some basic tools. For all intensive purposes, our 'root' system is complete. What we'll be doing now is building the infrastructure to run containers the way we want to.
Before we get around to setting things up, let's describe the what we're up to.
Currently, the only external service the server is running is ssh. We'd like our server to host more services. These services are likely to be both internal services like databases, and external services like HTTP/HTTPS hosts. Since the IPv4 space is increasingly crowded we'd like to do all of these things from one IPv4 address too.
There are lots of ways to provide isolation to these services. These include virtual machines through something like Xen, containers through something like Docker, or even just chroots. Isolation isn't just for security, it's one way to help increase your security somewhat but it isn't a silver bullet. Light solutions like containers have very little overhead though so the benefits tend to outweigh the costs.
We'll use containers, because they're entertaining and fun to use. This is a laboratory, remember? We're suposed to have fun and play with new things.
Our First Container
We'll be using machinectl to work with our containers. You may already know Docker, CoreOS's rkt, or lxc, and you're more than welcome to use any of those. The important thing is to only use one on this machine. You don't want to be managing multiple container providers. Good news is they're all pretty much compatible with one another, so if you use machinectl you can still use Docker images.
To get started with our first container, we first need to construct one. We'll make a scratch directory and put it in there. We'll tell pacstrap, which came as part of arch-install-scripts, to ignore anything that's likely unnecessary in a container.
|
This should give you an installed package size of under 400MB. That's pretty decent. We could probably go smaller but it doesn't really matter. We don't need to worry about setting up things like iptables since it's already protected via the host, and we don't need to have things like vi because we'll do our editing from outside of the container. We don't need things like linux because we're using the host kernel.
Take a look in ~/scratch and you should see what appears to be a base Arch install like we had before. We're going to use chroot to enable some services by default, finally we're going to import it into machinectl:
|
After a second, this command will be done and we'll have our first image. machinectl list-images shows this:
NAME TYPE RO USAGE CREATED MODIFIED
base subvolume yes 277.6M Wed 2016-05-11 15:52:43 PDT n/a
1 images listed.
We can test out the image with machinectl start base. If you get an "Operation not supported" error here it's because you updated your kernel and haven't restarted. Later we'll talk about kexec to help with this problem.
You can get a shell into the container with machinectl shell base. You'll probably notice there is no network connection. That's because our firewall is blocking us! You can check this by observing iptables -nvL and watching the packet counts go up on the icmp-port-unreachable filter.
Editing our /etc/iptables/iptables.rules we can add the green lines:
# Accept anything from the local loopback.
--append INPUT --in-interface lo --jump ACCEPT
+# Accept anything originating from a container.
+--append INPUT --in-interface ve-+ --jump ACCEPT
+--append FORWARD --in-interface ve-+ --jump ACCEPT
This rule will make it so any communications coming from our container adapters will be permitted accepted. Now reload the firewall with systemctl restart iptables.
Returning to your container with machinectl reboot base then machinectl shell base, from here you should be able to run ping hoverbear.org and ping silicon if silicon is your machine's hostname like mine.
You can check over your work by running btrfs subvolume list / and verifying that there is a var/lib/machines/base subvolume, then running machinectl list-images --all and verifying there is a base image.
From here you can create writable clones of the image with machinectl clone base $NAME, then start them, shell into them, and ultimately poweroff them. When you're ready to get rid of them entirely you can remove them.
Later on you can upgrade your base image in a similar way.
After a moment to start you can issue the upgrade command.
After another moment you can finish up.
Orcestration
At this point you might have identified a potential problem coming up. How do we create and maintain all these gosh darn containers? Well, that's a great question! We're going to try using Puppet!
This is a very different strategy compared to that taken by Docker. Docker uses declarative Dockerfiles to build images, then you upgrade by either replacing the current running container, or executing commands via the run subcommand. With puppet we'll be able to easily execute pre-defined provisioning scripts and maintain our running containers.
Also, I want to learn Puppet, so this is a great excuse. You can review the Puppet docs as well as this guide by Digital Ocean. Our plan is a bit different than those.
Let's start by creating a puppet container from our base.
Since we're basing our puppet image on our base Arch image we can have them share repository caches. Neat! Next we'll create a environments subvolume on our BTRFS volume so we can mount it into the puppet image. This will help us not accidently delete things. We're 'in' a subvolume already so we need to actually mount the true root for this.
We'll create /etc/systemd/nspawn/puppet.nspawn which is the container specific configuration.
See anti-feature note in next section.
Now we can start the machine and hop inside.
Inside of this container we can start seting up Puppet! First we'll install it:
Before we start making manifests we need to get the server working though! In order to do this we need to edit our configuration file at /etc/puppetlabs/puppet/puppet.conf.
# In puppet container
Next we'll make a simple .service file for puppet master since it doesn't come with one by default. You can run systemctl cat puppet to see the configuration of puppet agent. We'll basically just copy this.
Now you can start it with systemctl start puppetmaster. At this point let's consider that our puppet container should probably be managed by Puppet as well! Let's do that to learn how to use Puppet, then we'll figure out networking.
Now you can run systemctl start puppet (this is the agent) and you might see that /puppet-test now exists. If not you might either need to wait a moment or restart puppet. This is due to the syncing behaivor which we can tame later.
Finally run systemctl enable puppet and systemctl enable puppetmaster. Remember that you can edit this configuration from outside the container by editing the things in /puppet on the our root.
All we need to do now is add Puppet to our base. First make it writable with machinectl read-only base false, start it, then shell in with machinectl shell base.
Then poweroff it and mark it read only again with machinectl read-only base true. Now when we clone it the new containers will automatically run puppet.
Fixing an Anti-Feature
A note, if you machinectl remove $MACHINE systemd will, in rather user hostile manner, delete your /etc/systemd/nspawn/$MACHINE.nspawn file, with no output, no warning, or flag to disable this anti-feature.
So let's fix it! In fact, we can make our /etc/sytemd/network/ and /etc/systemd/nspawn folders both subvolumes for safe keeping. With config/puppetlabs/ also as subvolume our primary container configuration will be stored safely in subvolumes.
Reviewing btrfs subvolume list / you may notice that our machines have been creating subvolumes as they go. This is great. We can use these for snapshots later. We'll create a config/network and config/nspawn. Recall that we're 'in' a subvolume already so we need to actually mount the true root for this again.
# Remount the folders.
&&
&&
Great. Try touch /etc/systemd/nspawn/web.puppet, you can't! Instead, you can touch /config/nspawn/web.puppet.
Inter-Container Networking
Right now if you start up base and try to ping puppet from it things it won't work. You can observe the differences in their IP addresses. Mine were 169.254.143.177/16 and 169.254.221.43/16 which are on different subnets. We need some what to bring them together.
I tried a few ways of doing this (actually spent 6 days trying different ideas) and ended up on this because it's the simplest and seems easiest to understand.
First we'll go ahead and create a bridge. A bridge will let us join all the containers together on a single network so they can resolve each others names. For now this is sufficient, though there is some documentation suggestion VLANs can be used to improve this system. Perhaps we'll explore that later.
On our 'root' host:
Then run systemctl edit systemd-nspawn@ and enter the following:
# Set up.
ExecStartPre=/bin/sh
ExecStartPre=/bin/sh
ExecStartPre=/bin/sh
# Do work
ExecStart=
ExecStart=/bin/sh
# Tear down.
ExecStopPost=-/bin/sh
Try issuing machinectl clone base test machinectl poweroff test puppet, machinectl start test puppet, then try running machinectl shell test and see if you can ping puppet. Finally make sure you can ping hoverbear.org. If all is well then we've had great success.
At this point if you start up both the puppet and test container you should find a /puppet-test file on both. You can verify things with systemctl status puppet.
The Upgrade Problem
Earlier you made have already run into this problem, but if we run pacman -Syu and the linux package on the 'root' container gets upgraded this can cause some problems for us in the long haul. This may manifest as "Operation Not Permitted" errors when using tools like ip, this is because the kernel is trying to access modules on the file system which no longer exist.
We can use a tool called kexec to quickly (and uncleanly) switch to the new kernel. On my server this takes about a third of the time compared to actually rebooting. It's important to note that this doesn't replace a reboot though, as it doesn't make efforts to clean up anything. Things may go funky, and you may still need to reboot.
We'll build a systemd unit for this then enable it like so:
Now you can load into the newest kernel with systemctl kexec when you decide it's time. I'd suggest testing it now before moving on so you know it works! This also works great as a soft reboot.
The Web Proxy
Since we know we'd like to host be able to host multiple web services we'll need to set up a proxy to handle port 80, the default socket for httpd. Since only one program can bind to a given port at a time, its job will be to inspect the incoming traffic to observe the Server Name Indentification, or SNI, then route it to the appropriate container. By default this will be the web-$VARIABLE container, where the request is $VARIABLE.$DOMAIN.$TLD. We'll call the proxy itself web.
[network: example.hoverbear.org]->[web]->[web-example]
Later we'll follow a similar scheme for db, usr, box, etc. You can use whichever scheme you'd like, or change later, just be consistent.
We'll start by editing our puppet configuration to define what this host should look like. Since we have the puppet configuration subvolume mounted on /puppet on our host we can just edit it from there.
We can set up the container to bind to our host ports 80 and 443.
Then add in our firewall rules to the /etc/iptables/iptables.rules file, and reboot iptables.
# Allow SSH connections.
--append TCP --protocol tcp --dport 22 --jump ACCEPT
# Allow HTTP connections.
+--append TCP --protocol tcp --dport 80 --jump ACCEPT
# Allow HTTPS connections.
+--append TCP --protocol tcp --dport 443 --jump ACCEPT
Finally we can create the machine:
Give it a few seconds then try running systemctl status haproxy -M web.
Gotcha: Sometimes puppet doesn't wait for the name resolution to configure and needs to be restarted with systemctl restart puppet -M web. I'm currently pondering solutions to this.
Rest and Test
Since the hostname is silicon.hoverbear.org is already mapped in DNS for me, I'll use that for testing. You want to make your container web-$SUBDOMAIN.
Since we're testing we don't need to be clean at all. Just shell in, pacman -Syu nginx openssl and edit the /etc/nginx/nginx.conf on that container according to this diff:
-listen 80;
+listen [::]:443 ssl;
+ssl_certificate /etc/nginx/ssl/nginx.crt;
+ssl_certificate_key /etc/nginx/ssl/nginx.key;
Then generate the certificate with:
Then start up the nginx service on that container. Verify you can curl web-silicon from both the container itself and your web container. You can check systemctl status -M web-silicon nginx and systemctl status -M web haproxy for any messages.
With any luck, you'll see the nginx default web page.
If all is good you can make these machines default to running with machinectl enable web puppet. In our next post we'll get around to setting more refined web service. Remember, web-silicon was just a hello world.
System Evolution
Over time our system will inevitably change, grow, and decay. We've been spending time to make sure that whatever happens we'll have the tools to keep going. We haven't gone out of our way to over-engineer anything so far, we've just made what we need with a few sharp tools.
As we keep moving ahead we'll discover more needs and eventually build up our services to be better and stronger as a whole, but for now we have the basis to start our work. nspawn containers are persistent, but we can remove them whenever we want and re-create them with a new configuration. Our Puppet master will re-configure new images the same as old ones, and what we extend out with that can be retroactively applied to all. The key is for us to maintain a persisting copy of the data necessary in a safe space. BTRFS and the Bind= option of nspawn enable this.
In our next post we'll explore how to deploy a basic folder based nginx host and the RustFestEU Call for Papers site. Then, it'll go live! Whoa! But first, let's take a break. That post was so long!