ThinkGeek
ThinkGeek

Reputation: 5147

Recursively finding files in list of directories

How do I recursively count files in a list of Linux directories?

Example:

/dog/
  /a.txt
  /b.txt
  /c.ipynb

/cat/
  /d.txt
  /e.pdf
  /f.png
  /g.txt

/owl/
  /h.txt

I want following output:

5 .txt
1 .pynb
1 .pdf
1 .png

I tried the following, with no luck.

find . -type f | sed -n 's/..*\.//p' | sort | uniq -c

Upvotes: 2

Views: 133

Answers (3)

kvantour
kvantour

Reputation: 26531

Assume you have a known a directory path with the following subdirectories foo, bar, baz, qux, quux, gorge and we want to count the file types based on extension, but only for the subdirectories, foo, baz and qux

The best is to just do

$ find /path/{foo,baz,qux} -type f -exec sh -c 'echo "${0##*.}"' {} \; | sort | uniq -c

The exec part just uses a simple sh variable substitution to print the extension.

Upvotes: 1

Timur Shtatland
Timur Shtatland

Reputation: 12405

Use Perl one-liners to make the output in the format you need, like so:

find . -type f | perl -pe 's{.*[.]}{.}' | sort | uniq -c | perl -lane 'print join "\t", @F;' | sort -nr

The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array @F on whitespace or on the regex specified in -F option.

SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlrequick: Perl regular expressions quick start

Upvotes: 1

anubhava
anubhava

Reputation: 785761

This find + gawk may work for you:

find . -type f -print0 |
awk -v RS='\0' -F/ '{sub(/^.*\./, ".", $NF); ++freq[$NF]} END {for (i in freq) print freq[i], i}'

It is safe to use -print0 in find to handle files with whitespace and other special glob characters. Likewise we use -v RS='\0' in awk to ensure NUL byte is record seperator.

Upvotes: 1

Related Questions