Reputation: 310
I have a tgz file with multiple subdirectories. How do I count the number of files in each subdirectory without untarring the file? I am using Linux CentOS on Amazon EC2.
For example, I have a tgz file with directory dialogues/[0-9]
from http://dataset.cs.mcgill.ca/ubuntu-corpus-1.0/. Specifically, I'm looking at this tgz file: http://dataset.cs.mcgill.ca/ubuntu-corpus-1.0/ubuntu_dialogs.tgz
This tgz file has dialogues
as the primary directory and then many subdirectories ranging from 1 - 999(?). I want to be able to count the number of files in all the subdirectories. For example, dialogues/3
has 346,108 tsv files. dialogues/4
has 269,023 tsv files. I want to be able to see the number of files in all the subdirectories. Is there a linux command to do this without untarring the file?
I want the output to be the name of subdirectories and then the number of files the subdirectory contains next to the name of the subdirectory. Something like:
dialogs/3 - 346108
dialogs/4 - 269023
dialogs/5 - ######
Etc. It doesn't have to be exact but that's the idea.
Upvotes: 1
Views: 252
Reputation: 98118
tar tf ex.tgz | sed -n 's!/[^/]\+$!!p' | sort | uniq -c
Test:
mkdir -p a/c
touch a/{1,2,3,4,5,6}
touch a/c/{1,2}
mkdir b
touch b/{1,2,3}
tar cvfz ex.tgz a b
The output is then:
6 a
2 a/c
3 b
Upvotes: 0
Reputation: 9393
You can try this command:
tar tzf ubuntu_dialogs.tgz | grep dialogs | grep tsv | xargs -i dirname {} | uniq -c
I didn't download your 550 MB file, instead I tried this to count the jar files in a certain subdirectory inside one of my archives:
tar tzf NetLogo-6.0.1-64.tgz | grep app/extensions | grep jar | xargs -i dirname {} | uniq -c
and I get
2 NetLogo 6.0.1/app/extensions/arduino
1 NetLogo 6.0.1/app/extensions/array
1 NetLogo 6.0.1/app/extensions/bitmap
1 NetLogo 6.0.1/app/extensions/cf
2 NetLogo 6.0.1/app/extensions/csv
8 NetLogo 6.0.1/app/extensions/gis
4 NetLogo 6.0.1/app/extensions/gogo
6 NetLogo 6.0.1/app/extensions/ls
2 NetLogo 6.0.1/app/extensions/matrix
12 NetLogo 6.0.1/app/extensions/nw
1 NetLogo 6.0.1/app/extensions/palette
1 NetLogo 6.0.1/app/extensions/profiler
2 NetLogo 6.0.1/app/extensions/r
1 NetLogo 6.0.1/app/extensions/rnd
1 NetLogo 6.0.1/app/extensions/sample
1 NetLogo 6.0.1/app/extensions/sample-scala
1 NetLogo 6.0.1/app/extensions/sound
1 NetLogo 6.0.1/app/extensions/table
6 NetLogo 6.0.1/app/extensions/vid
3 NetLogo 6.0.1/app/extensions/view2.5d
(count of jars is in first column)
Upvotes: 1