The new stored format of Docker image on disk and Distribution

1 Introduction 自从Docker v1 10之后,Docker镜像在存储格式发生了很大的变化。主要是为了解决之前版本的镜像的一些问题: (1)安全

1 Introduction

自从Docker v1.10之后,Docker镜像在存储格式发生了很大的变化。主要是为了解决之前版本的镜像的一些问题:

(1)安全性。v1.10之前的image ID是随机生成的,与内容无关。这一方面很容易导致image ID被伪造,也容易引发冲突;

(2)image与layer没有严格的区别。每个layer都对应一个image ID和配置文件。这一方面导致image与layer的概念很模糊;另一方面,数据重复,考虑两个数据完全相同的layer,却是两个不同的layer。

为了解决上面的问题,新的格式下,image与layer是严格区分的。image ID与layer ID是两个不同的对象,生成的算法也不一样,但都是根据内容(content-addressable)生成的。正是这个区别导致了新的镜像存储在本地磁盘和registry端都发生了很大的变化。

2 Content hashes vs. distribution hashes

distribution存储的layer是压缩文件,所以使用 hashes of compressed data;而Docker在本地使用 content hashes,即 hashes of uncompressed data。为什么要做这种区分?

首先,对于registry,使用 hashes of compressed data,维护简单,而且Docker在下载时,不需要解压就可以验证文件。但是,每个push生成的 distribution hashes可能会不一样,因为同一个tar文件,每次压缩生成的文件可能会有区别。

另一方面,对于 docker builddocker savedocker loaduncompressed tars更加适合,所以本地使用 hashes of uncompressed data

aaronlehmann这里详细讨论了这些问题。

3 Distribution format

distribution顶层有2个目录, repositories存储image/tag相关的信息,blobs目录存储实际的layer数据:

# lsblobs repositories

busybox:latest为例:

# docker pull busybox:latestlatest: Pulling from busyboxc0a04912aa5a: Pull complete a3ed95caeb02: Pull complete Digest: sha256:ad28941632ea14c85dcac9fd2171599cd57381b40b4aac75f9ad67a3636d6658Status: Downloaded newer image for busybox:latest# docker imagesREPOSITORY TAG IMAGE ID CREATED SIZEbusybox latest 9e301a362a27 13 months ago 364.3 MB

可以看到, busybox:latest有2个layer: c0a04912aa5aa3ed95caeb02为layer的 distribution hashsha256:ad28941632ea14c85dcac9fd2171599cd57381b40b4aac75f9ad67a3636d6658为tag的digest; 9e301a362a27为image ID。我们来看看这4个hash值在distribution端的存储情况。

  • tag digest
# ls repositories/busybox/_manifests/tags/latest/index/sha256/ad28941632ea14c85dcac9fd2171599cd57381b40b4aac75f9ad67a3636d6658
  • repository’s layers

busybox的所有layer:

# ls repositories/busybox/_layers/sha256/9e301a362a270bcb6900ebd1aad1b3a9553a9d055830bdf4cab5c2184187a2d1 c0a04912aa5afc0b4fd4c34390e526d547e67431f6bc122084f1e692dcb7d34ea3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4

其中, 9e301a362a270bcb6900ebd1aad1b3a9553a9d055830bdf4cab5c2184187a2d1对应image ID。

  • layer data
# ls blobs/ -Rblobs/:sha256blobs/sha256:9e a3 ad c0blobs/sha256/9e:9e301a362a270bcb6900ebd1aad1b3a9553a9d055830bdf4cab5c2184187a2d1blobs/sha256/9e/9e301a362a270bcb6900ebd1aad1b3a9553a9d055830bdf4cab5c2184187a2d1:datablobs/sha256/a3:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4blobs/sha256/a3/a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4:datablobs/sha256/ad:ad28941632ea14c85dcac9fd2171599cd57381b40b4aac75f9ad67a3636d6658blobs/sha256/ad/ad28941632ea14c85dcac9fd2171599cd57381b40b4aac75f9ad67a3636d6658:datablobs/sha256/c0:c0a04912aa5afc0b4fd4c34390e526d547e67431f6bc122084f1e692dcb7d34eblobs/sha256/c0/c0a04912aa5afc0b4fd4c34390e526d547e67431f6bc122084f1e692dcb7d34e:data

可以看到,多了一个 ad28941632ea14c85dcac9fd2171599cd57381b40b4aac75f9ad67a3636d6658的目录,保存了tag->(image ID, layers)的信息:

# cat blobs/sha256/ad/ad28941632ea14c85dcac9fd2171599cd57381b40b4aac75f9ad67a3636d6658/data { "schemaVersion": 2, "mediaType": "application/vnd.docker.distribution.manifest.v2+json", "config": { "mediaType": "application/vnd.docker.container.image.v1+json", "size": 1374, "digest": "sha256:9e301a362a270bcb6900ebd1aad1b3a9553a9d055830bdf4cab5c2184187a2d1" }, "layers": [ { "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", "size": 224153958, "digest": "sha256:c0a04912aa5afc0b4fd4c34390e526d547e67431f6bc122084f1e692dcb7d34e" }, { "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", "size": 32, "digest": "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" } ]}

从这个目录,可以得到tag对应的image ID,以及image所有的layer。

另外, 9e301a362a270bcb6900ebd1aad1b3a9553a9d055830bdf4cab5c2184187a2d1目录保存了image的信息:

# cat blobs/sha256/9e/9e301a362a270bcb6900ebd1aad1b3a9553a9d055830bdf4cab5c2184187a2d1/data | python -mjson.tool{ "architecture": "amd64", "config": { "AttachStderr": false, "AttachStdin": false, "AttachStdout": false, "Cmd": [ "sh" ], "Domainname": "", "Entrypoint": null, "Env": null, "Hostname": "5f8e0e129ff1", "Image": "cfa753dfea5e68a24366dfba16e6edf573daa447abf65bc11619c1a98a3aff54", "Labels": null, "OnBuild": null, "OpenStdin": false, "StdinOnce": false, "Tty": false, "User": "", "Volumes": null, "WorkingDir": "" }, "container": "7f652467f9e6d1b3bf51172868b9b0c2fa1c711b112f4e987029b1624dd6295f", "container_config": { "AttachStderr": false, "AttachStdin": false, "AttachStdout": false, "Cmd": [ "/bin/sh", "-c", "#(nop) CMD [/"sh/"]" ], "Domainname": "", "Entrypoint": null, "Env": null, "Hostname": "5f8e0e129ff1", "Image": "cfa753dfea5e68a24366dfba16e6edf573daa447abf65bc11619c1a98a3aff54", "Labels": null, "OnBuild": null, "OpenStdin": false, "StdinOnce": false, "Tty": false, "User": "", "Volumes": null, "WorkingDir": "" }, "created": "2015-09-21T20:15:47.866196515Z", "docker_version": "1.8.2", "history": [ { "created": "2015-09-21T20:15:47.433616227Z", "created_by": "/bin/sh -c #(nop) ADD file:6cccb5f0a3b3947116a0c0f55d071980d94427ba0d6dad17bc68ead832cc0a8f in /" }, { "created": "2015-09-21T20:15:47.866196515Z", "created_by": "/bin/sh -c #(nop) CMD [/"sh/"]" } ], "os": "linux", "rootfs": { "diff_ids": [ "sha256:ae2b342b32f9ee27f0196ba59e9952c00e016836a11921ebc8baaf783847686a", "sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef" ], "type": "layers" }}

另外两个目录为layer tar data。

4 Pulling An Image

4.1 Pulling an Image Manifest

当docker拉取image时,首先会通过 Registry API获取manifest信息:

GET /v2/<name>/manifests/<reference>

The name and reference parameter identify the image and are required. The reference may include a tag or digest.

# curl -L "http://10.x.x.x:5000/v2/busybox/manifests/latest"{ "schemaVersion": 1, "name": "busybox", "tag": "latest", "architecture": "amd64", "fsLayers": [ { "blobSum": "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" }, { "blobSum": "sha256:c0a04912aa5afc0b4fd4c34390e526d547e67431f6bc122084f1e692dcb7d34e" } ], "history": [ { "v1Compatibility": "{/"architecture/":/"amd64/",/"config/":{/"Hostname/":/"5f8e0e129ff1/",/"Domainname/":/"/",/"User/":/"/",/"AttachStdin/":false,/"AttachStdout/":false,/"AttachStderr/":false,/"Tty/":false,/"OpenStdin/":false,/"StdinOnce/":false,/"Env/":null,/"Cmd/":[/"sh/"],/"Image/":/"cfa753dfea5e68a24366dfba16e6edf573daa447abf65bc11619c1a98a3aff54/",/"Volumes/":null,/"WorkingDir/":/"/",/"Entrypoint/":null,/"OnBuild/":null,/"Labels/":null},/"container/":/"7f652467f9e6d1b3bf51172868b9b0c2fa1c711b112f4e987029b1624dd6295f/",/"container_config/":{/"Hostname/":/"5f8e0e129ff1/",/"Domainname/":/"/",/"User/":/"/",/"AttachStdin/":false,/"AttachStdout/":false,/"AttachStderr/":false,/"Tty/":false,/"OpenStdin/":false,/"StdinOnce/":false,/"Env/":null,/"Cmd/":[/"/bin/sh/",/"-c/",/"#(nop) CMD [///"sh///"]/"],/"Image/":/"cfa753dfea5e68a24366dfba16e6edf573daa447abf65bc11619c1a98a3aff54/",/"Volumes/":null,/"WorkingDir/":/"/",/"Entrypoint/":null,/"OnBuild/":null,/"Labels/":null},/"created/":/"2015-09-21T20:15:47.866196515Z/",/"docker_version/":/"1.8.2/",/"id/":/"bc315f8368c625be080de23e37c18bd7e555787987f907ccaf0b9bd1b69d5fe8/",/"os/":/"linux/",/"parent/":/"77106241d10a8ed96dc42f690453f89ec95890b1494ccac18bac69065e4b67fa/"}" }, { "v1Compatibility": "{/"id/":/"77106241d10a8ed96dc42f690453f89ec95890b1494ccac18bac69065e4b67fa/",/"created/":/"2015-09-21T20:15:47.433616227Z/",/"container_config/":{/"Cmd/":[/"/bin/sh -c #(nop) ADD file:6cccb5f0a3b3947116a0c0f55d071980d94427ba0d6dad17bc68ead832cc0a8f in //"]}}" } ], "signatures": [ { "header": { "jwk": { "crv": "P-256", "kid": "LOC3:5XPT:EA5R:3TC3:BJT6:I3DM:FREB:PMXJ:JOPP:PCFT:L2IF:5N3H", "kty": "EC", "x": "FEMMDfyxC4WwfbsFe9qeCuFeK5QwgAxRYazrp1HgAQc", "y": "EnOzI1hTixYK8g2_gklO2Zkvb7bP5gzCkEG_v-aOqx0" }, "alg": "ES256" }, "signature": "bZtcwb6n0cW4Ch9L3mpmD6HJX29CE7f7rdQPIEHlxlhX_6qCUK0qS6dIaM8yWlQPr7cQrAieINTWMc5cj898Eg", "protected": "eyJmb3JtYXRMZW5ndGgiOjE5MTgsImZvcm1hdFRhaWwiOiJDbjAiLCJ0aW1lIjoiMjAxNi0xMC0yNVQwMjo1ODo1N1oifQ" } ]}
# curl -L "http://10.x.x.x:5000/v2/busybox/manifests/sha256:ad28941632ea14c85dcac9fd2171599cd57381b40b4aac75f9ad67a3636d6658"{ "schemaVersion": 2, "mediaType": "application/vnd.docker.distribution.manifest.v2+json", "config": { "mediaType": "application/vnd.docker.container.image.v1+json", "size": 1374, "digest": "sha256:9e301a362a270bcb6900ebd1aad1b3a9553a9d055830bdf4cab5c2184187a2d1" }, "layers": [ { "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", "size": 224153958, "digest": "sha256:c0a04912aa5afc0b4fd4c34390e526d547e67431f6bc122084f1e692dcb7d34e" }, { "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", "size": 32, "digest": "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" } ]}

4.2 Pulling a Layer

对应的接口为:

GET /v2/<name>/blobs/<digest>

Access to a layer will be gated by the name of the repository but is identified uniquely in the registry by digest.

示例:

# curl -L --progress "http://10.x.x.x:5000/v2/busybox/blobs/sha256:c0a04912aa5afc0b4fd4c34390e526d547e67431f6bc122084f1e692dcb7d34e" -o layer.tar ######################################################################## 100.0%# sha256sum layer.tar c0a04912aa5afc0b4fd4c34390e526d547e67431f6bc122084f1e692dcb7d34e layer.tar

5 On-disk format

5.1 ID definitions and calculations

在讨论镜像在本地的存储之前,需要先理解几个ID及其对应的计算方法: layer.DiffIDlayer.ChainIDimage.ID:

ID scheme Meaning Calculation
layer.DiffID ID for an individual layer DiffID = SHA256hex(uncompressed layer tar data)
layer.ChainID ID for a layer and its parents. This ID uniquely identifies a filesystem composed of a set of layers. For bottom layer: ChainID(layer0) = DiffID(layer0)
For other layers: ChainID(layerN) = SHA256hex(ChainID(layerN-1) + " " + DiffID(layerN))
image.ID ID for an image. Since the image configuration references the layers the image uses, this ID incorporates the filesystem data and the rest of the image configuration. SHA256hex(imageConfigJSON)
“V1 ID1” legacy image/layer ID; originally not content-addressable Calculated for schema1 manifest:
For top layer: V1ID(layerTOP) = SHA256hex(blobsum(TOP) + " " + V1ID(layerTOP-1) + " " + imageConfigJSON)
For other layers: V1ID(layerN) = SHA256hex(blobsum(layerN) + " " + V1ID(layerN-1))

详细参考 Docker Image Specification v1.2.0ID definitions and calculations

示例:

# docker inspect 9e301a362a27[ { "Id": "sha256:9e301a362a270bcb6900ebd1aad1b3a9553a9d055830bdf4cab5c2184187a2d1", ### image ID "RepoTags": [ "busybox:latest" ], "RepoDigests": [ ### tag digest "busybox@sha256:ad28941632ea14c85dcac9fd2171599cd57381b40b4aac75f9ad67a3636d6658" ], "Parent": "", "Comment": "", "Created": "2015-09-21T20:15:47.866196515Z", "Size": 364348077, "VirtualSize": 364348077, "GraphDriver": { "Name": "overlay", "Data": { "RootDir": "/data/docker/overlay/390c9a37d1edca218961b4133c67992b6e62b1b1c1cdf820494ff4df5b9fce7c/root" } }, "RootFS": { "Type": "layers", "Layers": [ ###diff IDs "sha256:ae2b342b32f9ee27f0196ba59e9952c00e016836a11921ebc8baaf783847686a", "sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef" ] } }]
  • Image ID

Image config -> Image ID

func (p *v2Puller) pullSchema1(ctx context.Context, ref reference.Named, unverifiedManifest *schema1.SignedManifest) (imageID image.ID, manifestDigest digest.Digest, err error) { config, err := v1.MakeConfigFromV1Config([]byte(verifiedManifest.History[0].V1Compatibility), &resultRootFS, history) if err != nil { return "", "", err } imageID, err = p.config.ImageStore.Create(config) ///image ID if err != nil { return "", "", err }}///image/store.gofunc (is *store) Create(config []byte) (ID, error) { ///image config -> image ID dgst, err := is.fs.Set(config) imageID := ID(dgst)}
  • layer.DiffID
///layer/layer_store.go///ts: tar gzip file; parent: 上一层layer.ChainIDfunc (ls *layerStore) Register(ts io.Reader, parent ChainID) (Layer, error) { ///解压layer,同时计算DiffID if err = ls.applyTar(tx, ts, pid, layer); err != nil { return nil, err }}/// 解压的同时,计算diffIDfunc (ls *layerStore) applyTar(tx MetadataTransaction, ts io.Reader, parent string, layer *roLayer) error { digester := digest.Canonical.New() ///计算DiffID tr := io.TeeReader(ts, digester.Hash()) tsw, err := tx.TarSplitWriter(true) if err != nil { return err } metaPacker := storage.NewJSONPacker(tsw) defer tsw.Close() // we're passing nil here for the file putter, because the ApplyDiff will // handle the extraction of the archive rdr, err := asm.NewInputTarStream(tr, metaPacker, nil) if err != nil { return err } applySize, err := ls.driver.ApplyDiff(layer.cacheID, parent, archive.Reader(rdr)) if err != nil { return err } // Discard trailing data but ensure metadata is picked up to reconstruct stream io.Copy(ioutil.Discard, rdr) // ignore error as reader may be closed layer.size = applySize layer.diffID = DiffID(digester.Digest())}
  • layer.ChainID
///image/rootfs_unix.go// RootFS describes images root filesystem// This is currently a placeholder that only supports layers. In the future// this can be made into an interface that supports different implementations.type RootFS struct { Type string `json:"type"` DiffIDs []layer.DiffID `json:"diff_ids,omitempty"`}// ChainID returns the ChainID for the top layer in RootFS.func (r *RootFS) ChainID() layer.ChainID { return layer.CreateChainID(r.DiffIDs)}///layer/layer.go// CreateChainID returns ID for a layerDigest slicefunc CreateChainID(dgsts []DiffID) ChainID { return createChainIDFromParent("", dgsts...)}func createChainIDFromParent(parent ChainID, dgsts ...DiffID) ChainID { if len(dgsts) == 0 { return parent } if parent == "" { return createChainIDFromParent(ChainID(dgsts[0]), dgsts[1:]...) } // H = "H(n-1) SHA256(n)" dgst := digest.FromBytes([]byte(string(parent) + " " + string(dgsts[0]))) return createChainIDFromParent(ChainID(dgst), dgsts[1:]...)}
  • layer.cacheID

此外,还有一个 layer.cacheID用于生成 graph driver的目录:

///layer/ro_layer.gotype roLayer struct { chainID ChainID diffID DiffID parent *roLayer cacheID string ///用于graph driver size int64 layerStore *layerStore descriptor distribution.Descriptor referenceCount int references map[Layer]struct{}}
///layer/layer_store.gofunc (ls *layerStore) Register(ts io.Reader, parent ChainID) (Layer, error) {///... // Create new roLayer layer := &roLayer{ parent: p, cacheID: stringid.GenerateRandomID(), referenceCount: 1, layerStore: ls, references: map[Layer]struct{}{}, descriptor: descriptor, } if err = ls.driver.Create(layer.cacheID, pid, "", nil); err != nil { ///创建graph driver目录 return nil, err }///...}

5.2 Stored format

在本地,有2个与镜像存储相关的目录,以overlayfs为例, ROOT/image/overlayROOT/overlay

前者保存image的信息,包括 imagedblayerdb

# ls image/overlay/distribution imagedb layerdb repositories.json

后者用于存储实际的数据。更多参考 这里

  • imagedb

保存image的信息。

### image ID为目录名称# ls image/overlay/imagedb/content/sha256/9e301a362a270bcb6900ebd1aad1b3a9553a9d055830bdf4cab5c2184187a2d1
  • layerdb

保存layer的信息。

### image/overlay/layerdb/sha256/保存每个layer的元数据信息,目录名称为layer的chainID# ls image/overlay/layerdb/sha256/ -R image/overlay/layerdb/sha256/:75a46a4a46d9b53d8bbd70d52a26dc08858961f51156372edf6e8084ba9cfdb6 ae2b342b32f9ee27f0196ba59e9952c00e016836a11921ebc8baaf783847686aimage/overlay/layerdb/sha256/75a46a4a46d9b53d8bbd70d52a26dc08858961f51156372edf6e8084ba9cfdb6:cache-id diff parent size tar-split.json.gzimage/overlay/layerdb/sha256/ae2b342b32f9ee27f0196ba59e9952c00e016836a11921ebc8baaf783847686a:cache-id diff size tar-split.json.gz

cache-id:

#### cache-id 为graph driver的目录名称# cat image/overlay/layerdb/sha256/ae2b342b32f9ee27f0196ba59e9952c00e016836a11921ebc8baaf783847686a/cache-id 5b0bd80f386aebead9ee65c7d18172e6d937303bcc63490ef1378f92c1fb9a36# cat image/overlay/layerdb/sha256/75a46a4a46d9b53d8bbd70d52a26dc08858961f51156372edf6e8084ba9cfdb6/cache-id 390c9a37d1edca218961b4133c67992b6e62b1b1c1cdf820494ff4df5b9fce7c# ls overlay/390c9a37d1edca218961b4133c67992b6e62b1b1c1cdf820494ff4df5b9fce7c 5b0bd80f386aebead9ee65c7d18172e6d937303bcc63490ef1378f92c1fb9a36

diff:

### diff保存当前layer的diff ID# cat image/overlay/layerdb/sha256/ae2b342b32f9ee27f0196ba59e9952c00e016836a11921ebc8baaf783847686a/diff sha256:ae2b342b32f9ee27f0196ba59e9952c00e016836a11921ebc8baaf783847686a ###chainID == diffID(bottom layer)# cat image/overlay/layerdb/sha256/75a46a4a46d9b53d8bbd70d52a26dc08858961f51156372edf6e8084ba9cfdb6/diff sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef

parent:

### parent保存上一层layer的chainID# cat image/overlay/layerdb/sha256/75a46a4a46d9b53d8bbd70d52a26dc08858961f51156372edf6e8084ba9cfdb6/parent sha256:ae2b342b32f9ee27f0196ba59e9952c00e016836a11921ebc8baaf783847686a

6 Summary

新的存储格式涉及的ID比较多,理解了这些ID,也就理解了存储格式。这里简单总结一下:

(1)Image ID与Layer ID的计算方法不同。Image ID是在本地由Docker根据镜像的描述文件计算的,并用于imagedb的目录名称。Layer ID一般指 Distribution根据layer compressed data计算的,并用于 Distribution的存储目录名称。

(2) layer.DiffID是在本地由Docker根据layer uncompressed data计算的。 layer.ChainID只用本地,根据 layer.DiffID计算,并用于layerdb的目录名称。

(3)layer.cacheID随机生成,用于graph driver的目录名称。

未登录用户
全部评论0
到底啦