host ports and hostnetwork: the NATty gritty
if you're familiar with kubernetes you know that pods (the basic workload unit in kubernetes) are all assigned their own IPs and exist in their own separate network (also pid, mount, etc.) namespaces. thus it's possible for two pods living on the same host to bind the same port without any conflicts. similarly, something running in the host's root network namespace would be able to bind the same port with no issues.
sometimes we need a way to statically expose a pod's bound ports on the host's IP. if we do this we (as k8s operators) need to ensure that there will not be port conflicts wherever the pods are scheduled so this is usually done through the use of daemonsets where scheduling (and thus port conflicts) will be obvious. for these reasons though using either hostports or hostnetwork is generally considered an antipattern outside of a few specialized scenarios!
(there's are also nodeport services, however these are not directly exposing a pod's ports so I'm not going to talk about them here)
hostNetwork
host network, as you might be able to guess, is a setting that allows the pod to run in the host's root network namespace. it's a field that can be set on the pod spec and when a pod with hostNetwork: true
is launched, two things that usually happen when starting a pod will not happen:
- setting up a separate network namespace for the pod
- running of CNI plugins
these two things are essentially all that are required to implement hostNetwork
. now let's talk about the implications of this.
if your pod is going to bind ports, the spec says that you must report them in the ports
section of the spec, but there is nothing that actually enforces this. it also gives access for these workloads to do things like tcpdump all traffic on a host's real interface. thus, this is a dangerous mode of operation as these workloads could wreak havoc in the root net namespace.
another downside of hostNetwork
is that traffic from these pods is indistinguishable from traffic from the host the pod is running on. thus, regular k8s network policy cannot be applied against them.
the upside of hostNetwork
is that 1 + 2 mean that your networking provider doesn't need to be up in order for these pods to function and any time the host has connectivity the pods will have connectivity also. if you have pods that need to come up before pod networking is available (assuming your pod networking is driven by other pods, as is the case for many network providers) then hostNetwork
is mandatory.
my recommendation though is to avoid this feature when you can.
host port
host port is a lighter-weight way of binding a port on a host and allows for enforced collision detection at schedule time. it's implemented in the portmap CNI plugin and is a field on the container spec in the ports section.
when a pod with hostports specified is launched the portmap plugin creates the following iptables rules in the prerouting and output chains of the nat table:
- a rule that looks for any traffic on the host port and retargets it to the destination port and the pod IP. eg if my host port is 8080, my pod port is 8081, and my pod's IP is
172.16.15.15
then the rule looks for traffic coming into the host on 8080 and redirects it to 172.16.15.15:8081 - a rule that looks for hairpin traffic (eg pod -> hostport -> back to pod) that marks traffic for MASQUERADE. without this rule the source address of the traffic going back to the pod be the pod's own IP.
- a rule that looks for local traffic (eg localhost -> local address) that marks traffic for masquerading. without this rule the source address of the traffic going to the pod would be 127.0.0.1 which would route to the pod's own net namespace instead of the host's
host ports are a better option than hostNetwork
if you can use them, but with one caveat that I think is worth mentioning. because the NAT is implemented in iptables, you won't see a socket listening on the host port being used which may be unexpected behavior if you don't understand what's happening. ie nc localhost <myport>
will work fine, but if you look at ss -l
nothing will show up for <myport>
and you have to go to iptables to see what's actually "listening" on the host.