Reputation: 71
Artifactory at the moment stores multiple duplicate docker image layers. If image A and image B both depend on layer SHA__12345 then artifactory will store both layer copies. Which is not a problem unless the layer SHA__12345 is a a gigabyte in size. In that case you can really quickly run out of space.
Is there a way in artifactory to deduplicate overlapping layers for storage reasons?
Thanks!
Upvotes: 6
Views: 1001
Reputation: 2770
Artifactory uses checksum-based storage:
A file that is uploaded to Artifactory, first has its SHA1 checksum calculated, and is then renamed to its checksum. It is then hosted in the configured filestore in a directory structure made up of the first two characters of the checksum. For example, a file whose checksum is "ac3f5e56..." would be stored in directory "ac"; a file whose checksum is "dfe12a4b..." would be stored in directory "df" and so forth.
In parallel, Artifactory's creates a database entry mapping the file's checksum to the path it was uploaded to in a repository. This way of storing binaries optimizes many operations in Artifactory since they are implemented through simple database transactions rather than actually manipulating files.
One implication of this is that artifacts are deduplicated in general. Any two artifacts with the same checksum will point to the same file in storage, even if they're in different repositories. This applies to docker layers, as well as all other artifacts. So you shouldn't be having any issues with this.
Upvotes: 7