Daniel Lee
Daniel Lee

Reputation: 310

Count number of files in subdirectory in tgz file in linux

I have a tgz file with multiple subdirectories. How do I count the number of files in each subdirectory without untarring the file? I am using Linux CentOS on Amazon EC2.

For example, I have a tgz file with directory dialogues/[0-9] from http://dataset.cs.mcgill.ca/ubuntu-corpus-1.0/. Specifically, I'm looking at this tgz file: http://dataset.cs.mcgill.ca/ubuntu-corpus-1.0/ubuntu_dialogs.tgz

This tgz file has dialogues as the primary directory and then many subdirectories ranging from 1 - 999(?). I want to be able to count the number of files in all the subdirectories. For example, dialogues/3 has 346,108 tsv files. dialogues/4 has 269,023 tsv files. I want to be able to see the number of files in all the subdirectories. Is there a linux command to do this without untarring the file?

I want the output to be the name of subdirectories and then the number of files the subdirectory contains next to the name of the subdirectory. Something like:

dialogs/3 - 346108
dialogs/4 - 269023
dialogs/5 - ######

Etc. It doesn't have to be exact but that's the idea.

Upvotes: 1

Views: 252

Answers (2)

perreal
perreal

Reputation: 98118

tar tf ex.tgz | sed -n 's!/[^/]\+$!!p' | sort | uniq -c

Test:

mkdir -p a/c
touch a/{1,2,3,4,5,6}
touch a/c/{1,2}
mkdir b
touch b/{1,2,3}
tar cvfz ex.tgz a b

The output is then:

6 a
2 a/c
3 b

Upvotes: 0

knb
knb

Reputation: 9393

You can try this command:

tar tzf ubuntu_dialogs.tgz | grep dialogs | grep tsv | xargs -i dirname {} | uniq -c

I didn't download your 550 MB file, instead I tried this to count the jar files in a certain subdirectory inside one of my archives:

tar tzf NetLogo-6.0.1-64.tgz | grep app/extensions | grep jar | xargs -i dirname {} | uniq -c

and I get

  2 NetLogo 6.0.1/app/extensions/arduino
  1 NetLogo 6.0.1/app/extensions/array
  1 NetLogo 6.0.1/app/extensions/bitmap
  1 NetLogo 6.0.1/app/extensions/cf
  2 NetLogo 6.0.1/app/extensions/csv
  8 NetLogo 6.0.1/app/extensions/gis
  4 NetLogo 6.0.1/app/extensions/gogo
  6 NetLogo 6.0.1/app/extensions/ls
  2 NetLogo 6.0.1/app/extensions/matrix
 12 NetLogo 6.0.1/app/extensions/nw
  1 NetLogo 6.0.1/app/extensions/palette
  1 NetLogo 6.0.1/app/extensions/profiler
  2 NetLogo 6.0.1/app/extensions/r
  1 NetLogo 6.0.1/app/extensions/rnd
  1 NetLogo 6.0.1/app/extensions/sample
  1 NetLogo 6.0.1/app/extensions/sample-scala
  1 NetLogo 6.0.1/app/extensions/sound
  1 NetLogo 6.0.1/app/extensions/table
  6 NetLogo 6.0.1/app/extensions/vid
  3 NetLogo 6.0.1/app/extensions/view2.5d

(count of jars is in first column)

Upvotes: 1

Related Questions