haolun
haolun

Reputation: 334

How to remove empty directories from a tarball in-place

I extracted a layer from a docker image which archived in a file called layer.tar. I want to remove empty directories from it.

I don't want to unpack then repack files in that archive, I want to keep the original info, so I want to do it in-place.

I know how to delete files from tar but I don't know any simple method to delete empty directories in-place.

Upvotes: 0

Views: 709

Answers (1)

KamilCuk
KamilCuk

Reputation: 140990

Let's create a archive t.tar with a/b/c/ and a/b/c/d/ empty directories:

mkdir -p dir
cd dir
mkdir -p a/b/c/d
mkdir -p 1/2/3/4
touch a/fil_ea a/b/file_ab # directory a/b/c and a/b/c/d are empty
touch 1/2/3/file_123 1/2/3/4/file_1234 # directories 1/2/3/4 not empty
tar cf ../t.tar a 1
cd ..

Using tar tf and some filtering we can extract the directories and files in a tar archive. Then for each directory in tmpdirs we can check if it has any files in tmpfiles with a simple grep and then remove those directories using --delete tar option:

tar tf t.tar | tee >(grep '/$' > tmpdirs) | grep -v '/$' > tmpfiles
cat tmpdirs | xargs -n1 -- sh -c 'grep -q "$1" tmpfiles || echo "$1"' -- \
  | tac \
  | xargs -- tar --delete -f t.tar

Not that tac is a bit unneeded, but the files where sorted alphabetically in tar, so when tar removes the directory a/b/c/ with all subdirectories first and then tries to remove a/b/c/d/ directory it fails with an Not found in archive in error. tac is a cheap way to fix that, so tar first removes a/b/c/d/ and then a/b/c/.

Upvotes: 1

Related Questions