Reputation: 3161

How to grep files by regexp and number of lines

I need to grep only files that does not contain use Test::More tests => 1; string and having more than 10 strings. How to do that ?

Typical solution for printing file names without match is using grep -L flag and typical solution for counting line numbers is using wc -l. But how to combine them ?

grep -rL "use Test::More tests => 1;" t | wc -l

is showing just number of results in grep output.

Upvotes: 0

Answers (3)

Paul Hodges

Reputation: 15418

TL;DR:

awk 'FNR==1 { found=0 }
     /use\s+Test::More\s+tests\s*=>\s*1\s*;/ { found=1; }
     FNR > 10 { if ( found ) { print FILENAME; nextfile } }' t/*

Breaking it down, with and without grep.

To only get files with more than 10 lines:

awk 'FNR==11 { print FILENAME; nextfile; }' *

FNR is "File Number of Record", i.e., which line of this file are we on. If it's 11, there are more than ten lines, so print the FILENAME and move on the the next file.

You can save a list of files without your search string to an array with

declare -a lst=( $( grep -rL "use Test::More tests => 1;" t ) )

Then you can report those with over ten lines with

awk 'FNR==11 { print FILENAME; nextfile; }' "${lst[@]}"

Though I'd recommend not being quite so rigid - sometimes people fumble-finger or align things, etc, so try it this way:

declare -a lst=( $( grep -rLE "use\s+Test::More\s+tests\s*=>\s*1\s*;" t ) )
awk 'FNR==11 { print FILENAME; nextfile; }' "${lst[@]}"

You could do it all in one line with a subcall, like so:

awk 'FNR==11 { print FILENAME; nextfile; }' $( grep -rLE "use\s+Test::More\s+tests\s*=>\s*1\s*;" t )

This also avoids unnecessary extraneous executions. If you want to really trim it down, we could likely do it all in one awk, but if we need to traverse more than the one subdirectory then we ought to use grep or find anyway. Otherwise,

if you are only searching the files in the t directory and not its children -

awk 'FNR==1 { found=0 }
     /use\s+Test::More\s+tests\s*=>\s*1\s*;/ { found=1; }
     FNR > 10 { if ( found ) { print FILENAME; nextfile } }' t/*

You could refine this is, for example, all the files you are checking have a name like *.pl, which would avoid trying to read directories and other such ugliness. Likewise, this may get confused by odd/off filenames.

But IF what you actually wanted was files with more than ten distinct lines that do NOT have your token string in them, then change the awk to -

awk '1 == FNR { cnt=0; found=0; }
     hit[$0]  { next; }
     /use\s+Test::More\s+tests\s*=>\s*1\s*;/ { found=1; }
     { hit[$0]=1; cnt++;
       if ( 10 < cnt ) { print FILENAME; nextfile; }
     }
    ' t/*

Yes, you can squish all that into one line if you prefer but ew, don't, lol.

Upvotes: 0

anubhava

Reputation: 786091

You can run a loop using find in process substitution:

while IFS= read -d '' -r file; do
   grep -Fq 'use Test::More tests => 1;' "$file" && 
   (( $(wc -l < "$file") >= 10 )) && echo "$file"
done < <(find . -type f -print0)

This code takes care of filenames with space, newlines or glob characters.

Upvotes: 1

Mark

Reputation: 4453

grep -L will list files that do not contain the search string. So, grep -L is a fundamental part of your solution. However, by piping the result to wc -l, you are simply counting all the files that do not contain the search string. This is not what you wanted as you indicated. Rather, you just want to list files that don't have the search string AND have more than 10 lines. Consider the following code:

grep -rL "use Test::More tests => 1;" t  | xargs wc -l | awk '$1 > 10 {print $2}'

The most interesting command here is xargs which takes the output coming in on stdin and passes that as arguments to the next command: wc -l. Now wc -l will give you a list of linecounts and the file name. This gets piped to awk that selects all lines that have the first column value greater than 10 and displays only the second column.

You might find it useful to run the commands separately to see the output passed to the next pipe:

grep -rL "use Test::More tests => 1;" t  | xargs echo

grep -rL "use Test::More tests => 1;" t  | xargs wc -l

grep -rL "use Test::More tests => 1;" t  | xargs wc -l | awk '$1 > 10 '

And then putting it all together:

grep -rL "use Test::More tests => 1;" t  | xargs wc -l | awk '$1 > 10 {print $2}'

Upvotes: 1

How to grep files by regexp and number of lines

Answers (3)

Related Questions