Prevent Privilege Escalation from Container Breakout via UserNS Remapping
Hello World! In my previous posts, I have discussed a lot on how does a user with certain capabilities can escape the docker container and execute commands on the root of the host. The naive approach to fix this issue could be the combination of the following
- Disable capabilities like
CAP_DAC_READ_SEARCH
,CAP_SYS_MODULE
and etc - Relinquish the root user privileges before executing ENTRYPOINT in the dockerfile
- Implement firewall to disable privileged container and mounting of file system using
-v
argument and use volumes instead
In most of the cases, some options could be required. For instance, in of the applications that I am working on right now, we are saving the build time for production releases by reusing the image of stage environment and replacing environment file at the run time using bind mount. Instead, remap the default root, which is used to spawn containerd-shims and then child processes, to a separate low-privileged user. This technique is known as User Namespace Remapping in the Docker world.
Implementing User Namespace Remapping
You can see the current session is running on behalf of a low privileged user, student. But it is allowed to perform all actions on the docker, as you can see it is added to the docker group, which means interact with the docker UNIX socket.
There are two repositories cloned in the home directory which I will be using to demonstrate the remapping and then try to exploit it.
Spoiler alert! It won't happen 😅
In the docker-privsec directory you will find the a shell script which contains instructions to implement the remapping.
You will find the following contents in the userns-remap.sh
script. The first two commands are pretty straight forward, create a user and group with name dockremap and set the shell to /bin/false
so that it can not be used as a login.
You will also see that it updates the /etc/docker/daemon.json
file and add { "userns-remap": "default" }
to it. Edit the echo line in the file as shown below to support both insecure registries and user namespace remapping.
The default
value of user namespace remapping in docker points to dockremap
user. If you wish to add different user, make sure change this value to that user and group, in format user:group.
Lastly, this script will reload the systemd units and then restart the docker service. Now, the dockerd will read this updated configuration from the daemon.json
file and map the user in the namespace to dockremap.
Note: The password of the root user is provided in the lab description.
Now, go to the $HOME/dockerrootplease
directory and edit Dockerfile, as shown in the following diff. This will let you use the fresh parent image from the registry if it is not pulled already.
Build the image using docker build
command and give it any tag you want. I am using short and relevant tag rootme:latest.
You will find the command to run the exploit in the README.md
file as shown below. After copying it, make sure you change the image named used while building.
Run the docker container as shown below and you will see that it will spawn the shell after chroot'ing into the /hostOS
directory. You can confirm the container breakout from the process listing, which starts with /sbin/init
process.
Even though the effective user and group id are 0 (root), you won't be able to read the contents of the protected files like /etc/shadow
or the flag in /root/flag
. The container is completely isolated it cannot even run the directory listing command in the home directory of the root user.
The containerd-shim has started the entry point process as the dockremap user, as you can see from the process listing output on the host machine. While accessing the resources on the file system and etc, the kernel will use this user instead of the namespace user (root) to check the DAC permissions of the resources,
Note: Remember that containerd-shim will launch the entry point as the root user if no remapping is done.
Why does it work the way it works?
The UID 99999
is mapped within the namespace as UID 0
(root
) and inherited by all the child processes spawned by the first process (entry point). Similarity, this mapping will work with the GID. Since the remapping information is transparent to the namespace, you can confirm it by reading the uid_map and gid_map files from the procfs.
Let's ignore the last entry 65536 for the time being; the first entry in the map file only tells you the user or group id in the namespace, while the second entry in the map file tells you the user or group id outside of the namespace, which will be used by the kernel on the host.
How does This Differ from What fakeroot
do?
When you run the program with fakeroot, it will inject it's interceptor via LD_PRELOAD
and LD_PRELOAD_PATH
environment variable and patching the system calls on the go. For security reasons, it will block this behaviour for open()
and create()
syscall functions.
In case of remapping, when the containerd will run the program, by adding the configuration into uid_map and gid_map files as shown below. This will be then used to map the user and group from inside to outside the container without patching anything on the runtime.
LD_PRELOAD
, check out two of the posts on Linux privilege escalation – Understanding Concept of Shared Libraries and Exploiting Shared Library MisconfigurationsWhere the Hell did Images go?
After implementing the namespaces, you won't be able to list the images anymore and this is an expected behaviour. The docker daemon (dockerd) will create a separate directory in /var/lib/docker/[uid].[gid]
.
Note: For testing purposes, I have created the user mapping for www-data. That is why you are seeing 33.33
directory here.
Resources
- https://docs.docker.com/engine/security/userns-remap/
- https://man7.org/linux/man-pages/man5/subuid.5.html
- https://www.reddit.com/r/linuxquestions/comments/vf1a3w/how_does_subuid_and_subgid_works_with_user/
- https://lwn.net/Articles/532593/
- https://docs.sylabs.io/guides/3.5/admin-guide/user_namespace.html