äž»ãªç¹åŸŽãšå©ç¹
ãã¹ãŠã®ã³ã³ããã§ã® Linux ãŠãŒã¶ãŒããŒã ã¹ããŒã¹ã®æŽ»çš
Enhanced Container Isolation ã䜿çšãããšããã¹ãŠã®ãŠãŒã¶ãŒã³ã³ããã Linux ãŠãŒã¶ãŒããŒã ã¹ããŒã¹ ãå©çšããŠè¿œå ã®éé¢ãå®çŸãããŸããããã«ãããã³ã³ããå ã® root ãŠãŒã¶ãŒã Docker Desktop ã® Linux VM å ã®éç¹æš©ãŠãŒã¶ãŒã«ãããã³ã°ãããŸãã
äŸãã°ä»¥äžã®ããã«åäœããŸãïŒ
$ docker run -it --rm --name=first alpine
/ # cat /proc/self/uid_map
0 100000 65536
åºåçµæ 0 100000 65536
㯠Linux ãŠãŒã¶ãŒããŒã ã¹ããŒã¹ã®ç¹åŸŽã瀺ããŠããŸããã³ã³ããå
ã® root ãŠãŒã¶ãŒïŒ0ïŒã Docker Desktop ã® Linux VM å
ã®éç¹æš©ãŠãŒã¶ãŒ 100000 ã«ãããã³ã°ããããã®ãããã³ã°ã¯ 64K ã®é£ç¶ãããŠãŒã¶ãŒ ID ç¯å²ã«åã³ãŸããåæ§ã®ãããã³ã°ãã°ã«ãŒã ID ã«ãé©çšãããŸãã
åã³ã³ããã«ã¯ Sysbox ã«ãã£ãŠå°çšã®ãããã³ã°ç¯å²ãå²ãåœãŠãããŸããäŸãã°ã2 çªç®ã®ã³ã³ãããèµ·åãããšä»¥äžã®ããã«ç°ãªããããã³ã°ç¯å²ã衚瀺ãããŸãïŒ
$ docker run -it --rm --name=second alpine
/ # cat /proc/self/uid_map
0 165536 65536
äžæ¹ãEnhanced Container Isolation ã䜿çšããªãå Žåãã³ã³ããã® root ãŠãŒã¶ãŒã¯ãã¹ãäžã§ã rootïŒâçã® rootâïŒã§ãããããã¯ãã¹ãŠã®ã³ã³ããã«é©çšãããŸãïŒ
$ docker run -it --rm alpine
/ # cat /proc/self/uid_map
0 0 4294967295
Linux ãŠãŒã¶ãŒããŒã ã¹ããŒã¹ã掻çšããããšã§ãEnhanced Container Isolation ã¯ã³ã³ããå ããã»ã¹ã Linux VM å ã§ãŠãŒã¶ãŒ ID 0ïŒçã® rootïŒãšããŠå®è¡ãããããšãé²ããŸããããã«ãLinux VM å ã§æå¹ãªãŠãŒã¶ãŒ ID ãæã€ããšããªããã³ã³ããå ã®ãªãœãŒã¹ã®ã¿ã«å¶éããããããã³ã³ããã®éé¢æ§ãéåžžã®ã³ã³ãããããå€§å¹ ã«åäžããŸãã
ç¹æš©ã³ã³ããã®ã»ãã¥ãªãã£åŒ·å
docker run --privileged ...
ã䜿çšããç¹æš©ã³ã³ããã¯ãLinux ã«ãŒãã«ãžã®å®å
šãªã¢ã¯ã»ã¹æš©ãæã€ããå®å
šã§ã¯ãããŸãããããã¯ãç¹æš©ã³ã³ããããã¹ãŠã®ã±ãŒãããªãã£ãæå¹ã«ããSeccomp ã AppArmor ã®å¶éãç¡å¹åãããã¹ãŠã®ããŒããŠã§ã¢ããã€ã¹ã«ã¢ã¯ã»ã¹ã§ããç¶æ
ãæå³ããŸãã
Enhanced Container Isolation ã䜿çšãããšãç¹æš©ã³ã³ãã㯠Linux ãŠãŒã¶ãŒããŒã ã¹ããŒã¹ã Sysbox ã«ãããã®ä»ã®ã»ãã¥ãªãã£æè¡ã«ãã£ãŠãå²ãåœãŠããããªãœãŒã¹ã®ã¿ã«ã¢ã¯ã»ã¹å¯èœãšãªããŸãããã®ãããç¹æš©ã³ã³ããã Linux VM ã®èšå®ãå€æŽããããããªã¹ã¯ã軜æžãããŸãã
Enhanced Container Isolation ã¯ãç¹æš©ã³ã³ããã®èµ·åãé²ãã®ã§ã¯ãªãããããå®å šã«å®è¡ã§ããããã«ããŸããäŸãã°ãã°ããŒãã«ãªã«ãŒãã«èšå®ãå€æŽããç¹æš©ã¯ãŒã¯ããŒãïŒã«ãŒãã«ã¢ãžã¥ãŒã«ã®èªã¿èŸŒã¿ã Berkeley Packet Filters (BPF) ã®èšå®å€æŽãªã©ïŒã¯ãâpermission deniedâïŒæš©éããããŸããïŒãšãããšã©ãŒãåãåãããé©åã«åäœããŸããã
äŸãã°ãEnhanced Container Isolation ã¯ç¹æš©ã³ã³ããã Docker Desktop ã® Linux VM ã§èšå®ããããããã¯ãŒã¯èšå®ïŒBPF ã䜿çšïŒã«ã¢ã¯ã»ã¹ããããšãé²ããŸã:
$ docker run --privileged djs55/bpftool map show
Error: can't get next map: Operation not permitted
å¯Ÿç §çã«ãEnhanced Container Isolation ãæå¹ã§ãªãå Žåãç¹æš©ã³ã³ããã¯ç°¡åã«ãããå®è¡ã§ããŸã:
$ docker run --privileged djs55/bpftool map show
17: ringbuf name blocked_packets flags 0x0
key 0B value 0B max_entries 16777216 memlock 0B
18: hash name allowed_map flags 0x0
key 4B value 4B max_entries 10000 memlock 81920B
20: lpm_trie name allowed_trie flags 0x1
key 8B value 8B max_entries 1024 memlock 16384B
ç¹å®ã®é«åºŠãªã³ã³ããã¯ãŒã¯ããŒãïŒäŸ: Docker-in-DockerãKubernetes-in-Docker ãªã©ïŒã¯ç¹æš©ã³ã³ãããå¿ èŠãšããããšããããŸããEnhanced Container Isolation ã䜿çšããããšã§ããã®ãããªã¯ãŒã¯ããŒããåŸæ¥ããå®å šã«å®è¡ã§ããŸãã
ã³ã³ãã㯠Linux VM ãšããŒã ã¹ããŒã¹ãå ±æã§ããªã
Enhanced Container Isolation ãæå¹ãªå Žåãã³ã³ããã¯ãã¹ããš Linux ããŒã ã¹ããŒã¹ïŒäŸ:PIDããããã¯ãŒã¯ãuts ãªã©ïŒãå ±æããããšãã§ããŸãããããã«ãããéé¢ãç¶æãããŸãã
äŸãã°ãPID ããŒã ã¹ããŒã¹ãå ±æããããšãããšä»¥äžã®ããã«å€±æããŸã:
$ docker run -it --rm --pid=host alpine
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: error in the container spec: invalid or unsupported container spec: sysbox containers can't share namespaces [pid] with the host (because they use the linux user-namespace for isolation): unknown.
åæ§ã«ããããã¯ãŒã¯ããŒã ã¹ããŒã¹ãå ±æããããšããŠã倱æããŸã:
$ docker run -it --rm --network=host alpine
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: error in the container spec: invalid or unsupported container spec: sysbox containers can't share a network namespace with the host (because they use the linux user-namespace for isolation): unknown.
ããã«ãã³ã³ããäžã§ãŠãŒã¶ãŒããŒã ã¹ããŒã¹ãç¡å¹åããããã® --userns=host
ãã©ã°ãç¡èŠãããŸã:
$ docker run -it --rm --userns=host alpine
/ # cat /proc/self/uid_map
0 100000 65536
ãŸããDocker ãã«ã㧠--network=host
ãã©ã°ã䜿çšããããšããDocker buildx ã®ãšã³ã¿ã€ãã«ã¡ã³ãïŒnetwork.host
ã security.insecure
ïŒã䜿çšããããšãèš±å¯ãããŸããããã®ãããããããå¿
èŠãªãã«ãã¯æ£ããåäœããŸããã
ãã€ã³ãããŠã³ãã®å¶é
Enhanced Container Isolation ãæå¹ãªå Žåã§ããDocker Desktop ãŠãŒã¶ãŒã¯ Settings > Resources > File sharing ã§èšå®ããããã¹ããã£ã¬ã¯ããªãã³ã³ããã«ãã€ã³ãããŠã³ãããããšãåŒãç¶ãå¯èœã§ãããã ããLinux VM ã®ä»»æã®ãã£ã¬ã¯ããªãã³ã³ããã«ãã€ã³ãããŠã³ãããããšã¯ã§ããªããªããŸãã
ããã«ãããDocker Desktop ã® Linux VM å ã®éèŠãªãã¡ã€ã«ïŒäŸ:ã¬ãžã¹ããªã¢ã¯ã»ã¹ç®¡çããããã·èšå®ãDocker Engine èšå®ãªã©ã®æ§æãã¡ã€ã«ïŒãã³ã³ãããå€æŽããããšãé²ããŸãã
äŸãã°ãDocker Engine ã®æ§æãã¡ã€ã«ïŒLinux VM å
ã® /etc/docker/daemon.json
9ïŒãã³ã³ããã«ãã€ã³ãããŠã³ãããããšãããšã以äžã®ããã«å¶éãã倱æããŸã:
$ docker run -it --rm -v /etc/docker/daemon.json:/mnt/daemon.json alpine
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: error in the container spec: can't mount /etc/docker/daemon.json because it's configured as a restricted host mount: unknown
å¯Ÿç §çã«ãEnhanced Container Isolation ãç¡å¹ãªå Žåããã®ããŠã³ãã¯æåããDocker Engine ã®æ§æãã¡ã€ã«ã«ã³ã³ãããå®å šã«èªã¿æžãã¢ã¯ã»ã¹ã§ããããã«ãªããŸãã
ãã ãããã¹ããã¡ã€ã«ã®ãã€ã³ãããŠã³ãã¯éåžžã©ããæ©èœããŸããäŸãã°ããŠãŒã¶ãŒã Docker Desktop ãèšå®ããŠèªåã® $HOME
ãã£ã¬ã¯ããªãå
±æå¯èœã«ããå Žåããããã³ã³ããã«ãã€ã³ãããŠã³ãããããšãã§ããŸãïŒ
$ docker run -it --rm -v $HOME:/mnt alpine
/ #
ããã©ã«ãã§ã¯ãEnhanced Container Isolation ã«ãã Docker Engine ãœã±ããïŒ/var/run/docker.sock
ïŒãã³ã³ããã«ãã€ã³ãããŠã³ãããããšã¯èš±å¯ãããŸãããããã¯ããœã±ãããããŠã³ãããããšã§ã³ã³ããã Docker Engine ãå¶åŸ¡ããéé¢ãç Žãå¯èœæ§ãããããã§ãããã ããäžéšã®æ£åœãªäœ¿çšã±ãŒã¹ã«å¯Ÿå¿ãããããä¿¡é Œãããã³ã³ããã€ã¡ãŒãžã«å¯ŸããŠãã®å¶éãç·©åããããšãå¯èœã§ãã詳现㯠Docker ãœã±ããããŠã³ãã®èš±å¯ ãåç
§ããŠãã ããã
ã»ã³ã·ãã£ããªã·ã¹ãã ã³ãŒã«ã®å¯©æ»
Enhanced Container Isolation ã®ããäžã€ã®ç¹åŸŽã¯ãã³ã³ããå
ã®ç¹å®ã®é«åºŠã«ã»ã³ã·ãã£ããªã·ã¹ãã ã³ãŒã«ïŒäŸ: mount
ããã³ umount
ïŒãã€ã³ã¿ãŒã»ããããŠå¯©æ»ããããšã§ããããã«ããããããã®ã·ã¹ãã ã³ãŒã«ãå®è¡ããæš©éãæã€ããã»ã¹ããããã䜿çšããŠã³ã³ããã䟵害ããããšãé²ããŸãã
äŸãã°ãCAP_SYS_ADMIN
ïŒmount
ã·ã¹ãã ã³ãŒã«ãå®è¡ããããã«å¿
èŠãªæš©éïŒãæã€ã³ã³ããã¯ããã®æš©éã䜿çšããŠèªã¿åãå°çšã®ãã€ã³ãããŠã³ããæžã蟌ã¿å¯èœãªããŠã³ãã«å€æŽããããšã¯ã§ããŸãã:
$ docker run -it --rm --cap-add SYS_ADMIN -v $HOME:/mnt:ro alpine
/ # mount -o remount,rw /mnt /mnt
mount: permission denied (are you root?)
ãã®äŸã§ã¯ã$HOME
ãã£ã¬ã¯ããªãã³ã³ããã® /mnt
ãã£ã¬ã¯ããªã«èªã¿åãå°çšã§ããŠã³ããããŠããŸãããã®ç¶æ
ã¯ã³ã³ããå
ããå€æŽããããšã¯ã§ããŸããããã®ä»çµã¿ã«ãããã³ã³ããããã»ã¹ã mount
ãŸã㯠umount
ã䜿çšããŠã³ã³ããã®ã«ãŒããã¡ã€ã«ã·ã¹ãã ã䟵害ããããšãé²æ¢ãããŸãã
ãã ããåè¿°ã®äŸã§ã¯ãã³ã³ããå ã§ã®ããŠã³ãæäœïŒäŸ: äžæãã¡ã€ã«ã·ã¹ãã ã®äœæãèªã¿åãå°çš/æžã蟌ã¿å¯èœã®å€æŽïŒã¯åŒãç¶ãèš±å¯ãããŸãããããã®æäœã¯ã³ã³ããå ã§è¡ããããããã³ã³ããã®ã«ãŒããã¡ã€ã«ã·ã¹ãã ã䟵害ããããšã¯ãããŸãã:
/ # mkdir /root/tmpfs
/ # mount -t tmpfs tmpfs /root/tmpfs
/ # mount -o remount,ro /root/tmpfs /root/tmpfs
/ # findmnt | grep tmpfs
ââ/root/tmpfs tmpfs tmpfs ro,relatime,uid=100000,gid=100000
/ # mount -o remount,rw /root/tmpfs /root/tmpfs
/ # findmnt | grep tmpfs
ââ/root/tmpfs tmpfs tmpfs rw,relatime,uid=100000,gid=100000
ãã®æ©èœãšãŠãŒã¶ãŒããŒã ã¹ããŒã¹ãçµã¿åãããããšã§ãã³ã³ããããã»ã¹ããã¹ãŠã® Linux ã±ãŒãããªãã£ãæã£ãŠãããšããŠããããããå©çšããŠã³ã³ããã䟵害ããããšãã§ããªããªããŸãã
ããã«ãEnhanced Container Isolation ã¯ã·ã¹ãã ã³ãŒã«ã®å¯©æ»ããã»ãšãã©ã®ã³ã³ããã¯ãŒã¯ããŒãã«ãããŠããã©ãŒãã³ã¹ã«åœ±é¿ãäžããªã圢ã§å®æœããŸããå ·äœçã«ã¯ãäžè¬çã«äœ¿çšãããããŒã¿ãã¹ã®ã·ã¹ãã ã³ãŒã«ã¯ã€ã³ã¿ãŒã»ããããããŸãã«äœ¿çšãããã³ã³ãããŒã«ãã¹ã®ã·ã¹ãã ã³ãŒã«ã®ã¿ã察象ãšããŠããŸãã
ãã¡ã€ã«ã·ã¹ãã ã®ãŠãŒã¶ãŒ ID ãããã³ã°
åè¿°ã®ããã«ãEnhanced Container Isolation ã¯ãã¹ãŠã®ã³ã³ããã« Linux ãŠãŒã¶ãŒããŒã ã¹ããŒã¹ãæå¹ã«ããŸããããã«ãããã³ã³ããå ã®ãŠãŒã¶ãŒ ID ç¯å²ïŒ0ïœ64KïŒã Docker Desktop Linux VM å ã®ãå®éã®ãéç¹æš©ãŠãŒã¶ãŒ ID ç¯å²ïŒäŸ: 100000ïœ165535ïŒã«ãããã³ã°ãããŸãã
ããã«ãåã³ã³ããã«ã¯ Linux VM å ã§å°çšã®å®éã®ãŠãŒã¶ãŒ ID ç¯å²ãå²ãåœãŠãããŸãïŒäŸ: ã³ã³ãã 0 㯠100000ïœ165535ãã³ã³ãã 2 㯠165536ïœ231071ãã³ã³ãã 3 㯠231072ïœ296607 ãªã©ïŒãåãããšãã°ã«ãŒã ID ã«ãé©çšãããŸãããŸããã³ã³ãããåæ¢ããŠåèµ·åããå Žåã以åãšåããããã³ã°ãå²ãåœãŠãããä¿èšŒã¯ãããŸãããããã¯èšèšäžã®ä»æ§ã§ãããã»ãã¥ãªãã£ãããã«åäžãããããã®ãã®ã§ãã
ãã ããDocker ããªã¥ãŒã ãã³ã³ããã«ããŠã³ãããéã«ãã®ä»çµã¿ãåé¡ãšãªãå ŽåããããŸããããªã¥ãŒã ã«æžã蟌ãŸãããã¡ã€ã«ã«ã¯å®éã®ãŠãŒã¶ãŒ/ã°ã«ãŒã ID ãä»äžããããããã³ã³ããã®èµ·å/åæ¢/åèµ·åéããããã¯è€æ°ã®ã³ã³ããéã§ãããã®ãã¡ã€ã«ã«ã¢ã¯ã»ã¹ã§ããªããªãå¯èœæ§ããããŸãã
ãã®åé¡ã解決ããããã«ãSysbox 㯠Linux ã«ãŒãã«ã® ID ãããã³ã°ããŠã³ãæ©èœïŒ2021 幎ã«è¿œå ïŒã代æ¿ã® shiftsfs
ã¢ãžã¥ãŒã«ã䜿çšããŠããã¡ã€ã«ã·ã¹ãã ã®ãŠãŒã¶ãŒ ID åãããã³ã°ããå®æœããŸããããã«ãããã³ã³ããã®å®éã®ãŠãŒã¶ãŒ IDïŒäŸ: 100000ïœ165535 ã®ç¯å²ïŒãã Linux VM å
ã® 0ïœ65535 ã®ç¯å²ã«ã¢ã¯ã»ã¹ããããã³ã°ãããŸãããã®ä»çµã¿ã䜿ãã°ãåã³ã³ãããå°çšã®ãŠãŒã¶ãŒ ID ç¯å²ã䜿çšããŠããŠããããªã¥ãŒã ãã³ã³ããéã§ããŠã³ããããå
±æãããããããšãå¯èœã«ãªããŸãããŠãŒã¶ãŒã¯ã³ã³ããã®å®éã®ãŠãŒã¶ãŒ ID ãæ°ã«ããå¿
èŠããªããªããŸãã
ãã ãããã¡ã€ã«ã·ã¹ãã ã®ãŠãŒã¶ãŒ ID åãããã³ã°ã«ãããã³ã³ãããå®éã®ãŠãŒã¶ãŒ ID 0 ã䜿çšã㊠Linux VM å ã®ãã¡ã€ã«ã«ã¢ã¯ã»ã¹ããå¯èœæ§ããããŸããããã€ã³ãããŠã³ãå¶é ã«ãã£ãŠéèŠãª Linux VM ãã¡ã€ã«ãã³ã³ããã«ããŠã³ãããããšãé²æ¢ãããŸãã
Procfs ããã³ sysfs ã®ãšãã¥ã¬ãŒã·ã§ã³
Enhanced Container Isolation ã®ããäžã€ã®æ©èœã¯ãåã³ã³ããå
㧠/proc
ããã³ /sys
ãã¡ã€ã«ã·ã¹ãã ã®äžéšããšãã¥ã¬ãŒã·ã§ã³ããããšã§ãããã®æ©èœã«ãããã³ã³ããå
ã§ãã¹ãã«é¢ããæ©å¯æ
å ±ãé ããããLinux ã«ãŒãã«èªäœããŸã ããŒã ã¹ããŒã¹åããŠããªããã¹ãã«ãŒãã«ãªãœãŒã¹ãããŒã ã¹ããŒã¹åããããšãå¯èœã«ãªããŸãã
äŸãã°ãEnhanced Container Isolation ãæå¹ãªå Žåã/proc/uptime
ãã¡ã€ã«ã¯ Docker Desktop Linux VM ã®çšŒåæéã§ã¯ãªããã³ã³ããèªäœã®çšŒåæéã衚瀺ããŸãïŒ
$ docker run -it --rm alpine
/ # cat /proc/uptime
5.86 5.86
äžæ¹ãEnhanced Container Isolation ãç¡å¹ãªå ŽåãDocker Desktop Linux VM ã®çšŒåæéã衚瀺ãããŸãããã®äŸã¯ç°¡åãªãã®ã§ãããEnhanced Container Isolation ã Linux VM ã®èšå®ãæ å ±ãã³ã³ããããé ããããã«ãã£ãŠ VM ã®äŸµå®³ãé²ãããšãç®æããŠããããšã瀺ããŠããŸãã
ããã«ã/proc/sys
以äžã®äžéšã®ãªãœãŒã¹ããšãã¥ã¬ãŒã·ã§ã³ãããŠããŸãããããã®ãªãœãŒã¹ã¯ Linux ã«ãŒãã«ã«ãã£ãŠããŒã ã¹ããŒã¹åãããŠããªããããåã³ã³ããã¯ããããå¥ã
ã®ãã¥ãŒãæã¡ãŸããSysbox ã¯ããããã®ãªãœãŒã¹ã®å€ãã³ã³ããéã§èª¿æŽãã察å¿ãã Linux ã«ãŒãã«èšå®ãé©åã«ããã°ã©ã ããŸãã
ããã«ãããéåžžã§ããã°å®å šãªç¹æš©ãå¿ èŠãšããã³ã³ããã¯ãŒã¯ããŒãïŒããŒã ã¹ããŒã¹åãããŠããªãã«ãŒãã«ãªãœãŒã¹ã«ã¢ã¯ã»ã¹ããå¿ èŠããããã®ïŒããEnhanced Container Isolation ãæå¹ã«ããç¶æ ã§å®å šã«å®è¡ã§ããããã«ãªããŸããããã«ãã£ãŠã»ãã¥ãªãã£ãå€§å¹ ã«åäžããŸãã