Looking2learned
Looking2learned

Reputation: 213

File with the most lines in a directory NOT bytes

I'm trying to to wc -l an entire directory and then display the filename in an echo with the number of lines.

To add to my frustration, the directory has to come from a passed argument. So without looking stupid, can someone first tell me why a simple wc -l $1 doesn't give me the line count for the directory I type in the argument? I know i'm not understanding it completely.

On top of that I need validation too, if the argument given is not a directory or there is more than one argument.

Upvotes: 6

Views: 5448

Answers (7)

SoluableNonagon
SoluableNonagon

Reputation: 11755

find <directory> -type f -exec wc -l {} \; | sort -rn | head -n 20 | awk '{print $2}'

Thought I'd add in something similar to what others have. This one finds the files that have the most lines, sorts them, and prints out the file names

Upvotes: 0

Ali Cheaito
Ali Cheaito

Reputation: 3856

Here's one that works for me with the git bash (mingw32) under windows:

find . -type f -print0| xargs -0 wc -l

This will list the files and line counts in the current directory and sub dirs. You can also direct the output to a text file and import it into Excel if needed:

find . -type f -print0| xargs -0 wc -l > fileListingWithLineCount.txt

Upvotes: 0

Stephane Chazelas
Stephane Chazelas

Reputation: 6239

To find the file with most lines in the current directory and its subdirectories, with zsh:

lines() REPLY=$(wc -l < "$REPLY")
wc -l -- **/*(D.nO+lined[1])

That defines a lines function which is going to be used as a glob sorting function that returns in $REPLY the number of lines of the file whose path is given in $REPLY.

Then we use zsh's recursive globbing **/* to find regular files (.), numerically (n) reverse sorted (O) with the lines function (+lines), and select the first one [1]. (D to include dotfiles and traverse dotdirs).

Doing it with standard utilities is a bit tricky if you don't want to make assumptions on what characters file names may contain (like newline, space...). With GNU tools as found on most Linux distributions, it's a bit easier as they can deal with NUL terminated lines:

find . -type f -exec sh -c '
  for file do
    size=$(wc -c < "$file") &&
      printf "%s\0" "$size:$file"
  done' sh {} + |
  tr '\n\0' '\0\n' |
  sort -rn |
  head -n1 |
  tr '\0' '\n'

Or with zsh or GNU bash syntax:

biggest= max=-1
find . -type f -print0 |
  {
    while IFS= read -rd '' file; do
      size=$(wc -l < "$file") &&
        ((size > max)) &&
        max=$size biggest=$file
    done
    [[ -n $biggest ]] && printf '%s\n' "$max: $biggest"
  }

Upvotes: 0

TrueY
TrueY

Reputation: 7610

Nice question!

I saw the answers. Some are pretty good. The find ...|xrags is my most preferred. It could be simplified anyway using find ... -exec wc -l {} + syntax. But there is a problem. When the command line buffer is full a wc -l ... is called and every time a <number> total line is printer. As wc has no arg to disable this feature wc has to be reimplemented. To filter out these lines with is not nice:

So my complete answer is

#!/usr/bin/bash

[ $# -ne 1 ] && echo "Bad number of args">&2 && exit 1
[ ! -d "$1" ] && echo "Not dir">&2 && exit 1
find "$1" -type f -exec awk '{++n[FILENAME]}END{for(i in n) printf "%8d %s\n",n[i],i}' {} +

Or using less temporary space, but a little bit larger code in :

find "$1" -type f -exec awk 'function pr(){printf "%8d %s\n",n,f}FNR==1{f&&pr();n=0;f=FILENAME}{++n}END{pr()}' {} +

Misc

  • If it should not be called for subdirectories then add -maxdepth 1 before -type to .
  • It is pretty fast. I was afraid that it would be much slower then the find ... wc + version, but for a directory containing 14770 files (in several subdirs) the version run 3.8 sec and version run 5.2 sec.
  • and consider the not \n ended lines differently. The last line ended with no \n is not counted by . I prefer to count it as does.
  • It does not print the empty files

Upvotes: 1

Vijay
Vijay

Reputation: 67291

Is this what you want?

> find ./test1/ -type f|xargs wc -l
       1 ./test1/firstSession_cnaiErrorFile.txt
      77 ./test1/firstSession_cnaiReportFile.txt
   14950 ./test1/exp.txt
       1 ./test1/test1_cnaExitValue.txt
   15029 total

so your directory which is the argument should go here:

find $your_complete_directory_path/ -type f|xargs wc -l

Upvotes: 5

jaypal singh
jaypal singh

Reputation: 77135

I'm trying to to wc -l an entire directory and then display the filename in an echo with the number of lines.

You can do a find on the directory and use -exec option to trigger wc -l. Something like this:

$ find ~/Temp/perl/temp/ -exec wc -l '{}' \;
wc: /Volumes/Data/jaypalsingh/Temp/perl/temp/: read: Is a directory
      11 /Volumes/Data/jaypalsingh/Temp/perl/temp//accessor1.plx
      25 /Volumes/Data/jaypalsingh/Temp/perl/temp//autoincrement.pm
      12 /Volumes/Data/jaypalsingh/Temp/perl/temp//bless1.plx
      14 /Volumes/Data/jaypalsingh/Temp/perl/temp//bless2.plx
      22 /Volumes/Data/jaypalsingh/Temp/perl/temp//classatr1.plx
      27 /Volumes/Data/jaypalsingh/Temp/perl/temp//classatr2.plx
       7 /Volumes/Data/jaypalsingh/Temp/perl/temp//employee1.pm
      18 /Volumes/Data/jaypalsingh/Temp/perl/temp//employee2.pm
      26 /Volumes/Data/jaypalsingh/Temp/perl/temp//employee3.pm
      12 /Volumes/Data/jaypalsingh/Temp/perl/temp//ftp.plx
      14 /Volumes/Data/jaypalsingh/Temp/perl/temp//inherit1.plx
      16 /Volumes/Data/jaypalsingh/Temp/perl/temp//inherit2.plx
      24 /Volumes/Data/jaypalsingh/Temp/perl/temp//inherit3.plx
      33 /Volumes/Data/jaypalsingh/Temp/perl/temp//persisthash.pm

Upvotes: 1

paxdiablo
paxdiablo

Reputation: 882028

wc works on files rather than directories so, if you want the word count on all files in the directory, you would start with:

wc -l $1/*

With various gyrations to get rid of the total, sort it and extract only the largest, you could end up with something like (split across multiple lines for readability but should be entered on a single line):

pax> wc -l $1/* 2>/dev/null
       | grep -v ' total$'
       | sort -n -k1
       | tail -1l

2892 target_dir/big_honkin_file.txt

As to the validation, you can check the number of parameters passed to your script with something like:

if [[ $# -ne 1 ]] ; then
    echo 'Whoa! Wrong parameteer count'
    exit 1
fi

and you can check if it's a directory with:

if [[ ! -d $1 ]] ; then
    echo 'Whoa!' "[$1]" 'is not a directory'
    exit 1
fi

Upvotes: 8

Related Questions