How does mount propagation behave when calling clone with `CLONE_NEWUSER|CLONE_NEWNS`?

Question

My program call clone and call /bin/sh in the subprocess.

In the shell, I run cat /proc/$$/mountinfo to see the propagation attribution. If the flag is CLONE_NEWNS, I got this:

# cat /proc/$$/mountinfo
194 193 8:1 / / rw,relatime shared:1 - ext4 /dev/sda1 rw,discard,errors=remount-ro
...

If combining CLONE_NEWNS and CLONE_NEWUSER(uncommenting flags |= CLONE_NEWUSER; in the following source), I got this:

199 198 8:1 / / rw,relatime master:1 - ext4 /dev/sda1 rw,discard,errors=remount-ro
...

Why CLONE_NEWUSER would make the difference? On my machine (Debian 9), it should always be MS_SHARED since it's created from a MS_SHARED mounting point.

#define _GNU_SOURCE
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define STACK_SIZE (1024 * 1024)

static char container_stack[STACK_SIZE];

char *const container_args[] = {"/bin/sh", NULL};

int container_main(void *arg) {
  printf("Container - inside the container!
");
  printf("container pid is %d
", getpid());
  int status = execv(container_args[0], container_args);
  if (status < 0) perror("execv");
  printf("Something's wrong!
");
  return 0;
}

int main() {
  printf("Parent [ %d ] - start a container!
", getpid());

  int flags = CLONE_NEWNS;
  //flags |= CLONE_NEWUSER;

  int container_pid = clone(container_main, container_stack + STACK_SIZE,
                            SIGCHLD | flags, NULL);
  if (container_pid < 0) {
    perror("clone");
    return -1;
  }

  printf("Container pid is %d
", container_pid);
  waitpid(container_pid, NULL, 0);
  printf("Parent - container stopped!
");
  return 0;
}

Joseph Sible-Reinstate Monica · Accepted Answer

man 7 mount_namespaces explains it. Relevant excerpts:

   *  Each mount namespace has an owner user namespace.  As
      explained above, when a new mount namespace is created, its
      mount point list is initialized as a copy of the mount point
      list of another mount namespace.  If the new namespace and the
      namespace from which the mount point list was copied are owned
      by different user namespaces, then the new mount namespace is
      considered less privileged.

   *  When creating a less privileged mount namespace, shared mounts
      are reduced to slave mounts.  (Shared and slave mounts are
      discussed below.)  This ensures that mappings performed in
      less privileged mount namespaces will not propagate to more
      privileged mount namespaces

   shared:X
          This mount point is shared in peer group X.  Each peer
          group has a unique ID that is automatically generated by
          the kernel, and all mount points in the same peer group
          will show the same ID.  (These IDs are assigned starting
          from the value 1, and may be recycled when a peer group
          ceases to have any members.)

   master:X
          This mount is a slave to shared peer group X.

How does mount propagation behave when calling clone with `CLONE_NEWUSER|CLONE_NEWNS`?

Answers (1)

Related Questions