Roman
Roman

Reputation: 9441

Clean up duplicate versions of docker images

I have maaaaany docker images taking up space.

Among those are lots of duplicates (differentiated by tags, which are versions of the same thing). For example

my-image-name  d5c0a632266fbb32d3864c9dbf52b11ebdb03885    79584dff1249    11 months ago  575MB
my-image-name  04cd5a4ab3ebfec48a5fe66e2f6ae0520209b294    e7049365408e    11 months ago  575MB
my-image-name  3e050a876a7bed2df0f0bb2c4da5cdba75de1ca5    b04345c45d85    11 months ago  575MB
my-image-name  d97f04bbad9900af897cd54dc2b1c02ce0c06454    0e9fd34d9bf6    11 months ago  575MB
my-image-name  e03151317d6b199cbfd8a93a7dbb2a868ed77536    1e1b112a4d79    11 months ago  575MB

My cleanup goto is

alias dockerclean='docker images -f '\''dangling=true'\'' -q | xargs docker rmi && docker ps -f '\''status=exited'\'' -q | xargs docker rm'

And when I'm feeling aggressive

docker system prune --volumes

Anyone know of a way to clean up duplicates such as these? Extra points if I can keep the most recent one.

P.S I don't want to name them manually, for example with something like docker images | grep my-image-name | awk '{print $1 ":" $2}' | xargs docker rmi

Upvotes: 2

Views: 983

Answers (2)

Roman
Roman

Reputation: 9441

This is a reference appendix for utility to the wonderful answer by @norbjd because it doesn't go well in a comment.

An alias to go into shell initialization (~/.zshrc):

dockerrmiold() {
comm -23 \
    <(docker images --format='{{ .Repository }}:{{ .Tag }}' | sort) \
    <(docker images --format='{{ .Repository }},{{ .CreatedAt }},{{ .Tag }}' | \
        sort | \
        awk -F',' 'NR>1{arr[$1]=$1":"$3} END{for (a in arr) print arr[a]}' | \
        sort) | \
        grep -v "<none>:<none>" | \
        xargs docker rmi
}

Upvotes: 1

norbjd
norbjd

Reputation: 11237

The following returns all images + tag except the most recent one :

comm -23 \
    <(docker images --format='{{ .Repository }}:{{ .Tag }}' | sort) \
    <(docker images --format='{{ .Repository }},{{ .CreatedAt }},{{ .Tag }}' | \
        sort | \
        awk -F',' 'NR>1{arr[$1]=$1":"$3} END{for (a in arr) print arr[a]}' | \
        sort)

(I guess this can be shortened but this is what I'm using on Linux machines)

Explanations

The first parameter of comm is a command :

docker images --format='{{ .Repository }}:{{ .Tag }}' | sort

returning the list of images sorted, like this :

alpine:3.10
alpine:3.11
alpine:3.8
debian:stretch
debian:buster
[...]

The second parameter of comm is also a command, returning, for each image (.Repository), the most recent created tag (based on .CreatedAt attribute) :

docker images --format='{{ .Repository }},{{ .CreatedAt }},{{ .Tag }}' | \
    sort | \
    awk -F',' 'NR>1{m[$1]=$1":"$3} END{for(a in m) print arr[m]}'

m is a map, and the content is overwritten for each line, so the final result is the most recent image (since input is sorted).

Example :

alpine:3.11
debian:buster

comm -23 basically says : make the difference between the first command (all images sorted) and the second command (last created image for each image), and return only the images returned by the first command (that is to say the non-last created image for each image).

If you are OK with the results (images that will be deleted), you can add | xargs docker rmi after this command to automatically delete those images (I did not include this part in the command at the beginning because I'm sure some people would have copy-pasted the command without testing it works first).

Upvotes: 3

Related Questions