Today I learned that Docker automatically does extremely wreckless and dangerous things to the firewall on every node in a docker swarm, which leaves various swarm services wide open to the internet, and that Docker provides no way to turn off this behavior. I spent a lot of time today trying to understand what the hell was happening, and then a lot more time trying to find a solution to fix it. I think I found a better solution than suggested anywhere else, so here it is for anyone who stumbles here via a search engine.
Docker automatically reconfigures
iptables to set up its networking. If you have a service port of type
ingress, you will see a few rules in your
iptables corresponding to it, which take care of routing that traffic in the right direction. These rules allow traffic from any source to route to any service ports of type
ingress. This is arguably helpful for the
tcp/443. I wouldn’t make that argument, but someone might. However, I have some ingress ports defined to make cluster-wide services such as logging and discovery internally available. I do not want these exposed to the internet, and I did not expected Docker to automatically reconfigure the filewall to make them available, but that’s what happened. Even worse, I noticed that Docker had opened port
8080 on my test cluster, directly connecting to a docker-container-stats instance, which does not even define an ingress port. I don’t know why it did this - most other non-ingress service ports were still safely firewalled off, but this one was open. A mystery.
None of this behavior is visible if you try to find them with
ufw. You need to go a level deeper, and look at the whole
iptables config using a command like
This is all surprising and dangerous. External access into services should be by explicit configuration only. Docker uses
iptables for internal networking, but most of us use it as our firewall. When docker adds a rule opening access to a service, it is opening it up to the World.
I spent a long time today simply not believing my eyes, and then searching for an explanation as to how this could have been built this way. The only explanation I found was that maybe the engineers who wrote this didn’t expect Docker Swarm to be run directly on the Internet. Maybe they thought that there would always be separate (non-swarm, maybe non-Docker) load balancer and firewall in front of docker swarm nodes (allowing them to throw caution to the wind and have fun with the node firewalls). For me, however, the point of having a cluster is to run all my infrastructure inside of it.
Docker documentation discusses an
iptables: false option that can be supplied through the
/etc/docker/daemon.json file. By “discuss”, I should really say “warns you not to use it”. Anyway, I tried using it, and discovered that it simply does not work at all (and open issues related to this appear to be unattended for years).
One solution to this security nightmare is to put a non-swarm loadbalancer and firewall in front of docker swarm. Another is to flush all docker firewall rules out of
iptables, however doing this breaks inter-service connectivity in my testing. So I did some further testing.
When docker configures iptables for an ingress service port, it creates three rules, such as
-A DOCKER-INGRESS -p tcp -m tcp --dport 8080 -j DNAT --to-destination 172.18.0.2:8080 -A DOCKER-INGRESS -p tcp -m tcp --dport 8080 -j ACCEPT -A DOCKER-INGRESS -p tcp -m state --state RELATED,ESTABLISHED -m tcp --sport 8080 -j ACCEPT
In my cluster, an ingress port might be
443, used for the load balancer (I am using docker-flow-proxy running with letsencrypt-provisioner). Or it might be an internal service, which I want to be available across every container in every stack in the cluster (e.g., logging). I achieve the latter by having logging clients connect to the
DOCKER_GW_BRIDGE IP address. So, in the above example, I can connect to
172.18.0.1:8080 from any container and reach the service (i.e.,
nc 172.18.0.1 8080 succeeds)
By removing any of the above rules, I would disable access both from the internet and from inside the cluster. However, after a bit of experimenting, I found that I could harden the rules by restricting the
ACCEPT rule to only allow internal docker source IP addresses (i.e.,
-A DOCKER-INGRESS -p tcp -m tcp -s 172.16.0.0/12 --dport 8080 -j ACCEPT
I found that I could automatically process
iptables to make this change using
sudo iptables-save | sed -re "s/(-A DOCKER-INGRESS -p (udp|tcp) -m (udp|tcp) --dport .*)( -j ACCEPT)/\1 -s 172.16.0.0\/12 \4/" > /tmp/iptables-fixed.txt sudo iptables-restore < /tmp/iptables-fixed.txt
At this point, internal communication was open between containers as normal, but internet access to ingress ports had been shut down. Obviously, I actually want to be able to serve web traffic, so to re-open web I needed to add a rule. Simply adding a new rule in
ufw won’t work, because the
iptables chain will be evaluated first. Docker provides the
DOCKER-USER chain for this purpose. We can insert a new rule in there and it will get evaluated first.
To explicitly open access to a port to the Internet, you can use the
-I option to insert a new rule at the top of the
DOCKER-USER iptables chain, which is evaluated before the
sudo iptables -I DOCKER-USER -p tcp -m tcp --dport 80 -j ACCEPT
Putting this altogether, here is a script that hardens Docker’s lax iptables rules, and then explicitly opens up web.
#! /usr/bin/bash # Docker swarm automatically makes service ports available to the internet by creating new iptable rules. # These firewall rules will not show up in ufw. # This script is intended to be run as a `ExecStartPost` step in dockerd's systemd unit, and tightens up any # iptable rules it finds left there by docker. In particular, it limits all traffic to service ports to # internal docker client IPs in the subnet 172.16.0.0\/12. # In addition, it opens up the HTTP/HTTPS ports. iptables-save | sed -re "s/(-A DOCKER-INGRESS -p (udp|tcp) -m (udp|tcp) --dport .*)( -j ACCEPT)/\1 -s 172.16.0.0\/12 \4/" > /tmp/iptables-fixed.txt iptables-restore < /tmp/iptables-fixed.txt rm /tmp/iptables-fixed.txt # Open up HTTP/HTTPS ports. Avoid creating duplicate rules by checking with `iptables -C` rule="DOCKER-USER -p tcp -m tcp --dport 80 -j ACCEPT" if ! iptables -C $rule 2> /dev/null; then iptables -I $rule fi rule="DOCKER-USER -p tcp -m tcp --dport 443 -j ACCEPT" if ! iptables -C $rule 2> /dev/null; then iptables -I $rule fi
I then made sure this script is run automatically when docker starts by adding it to the
docker.service systemd unit as a post execution hook. On Ubuntu 20.04, this file resides at
[Service] # ... # ExecStart=... ExecStartPost=/usr/local/bin/fix-docker-iptables-rules.sh
Finally, for good measure I also invoke this script during the final build step during CI deployments.