Descartes
Descartes

Reputation: 69

How to search a specific expression in multiple files with Awk

I have like 500 text documents. In every of them the expression "Numero de expediente" appears at least once. I want to locate every file where there is at least twice. Every file has its own name, I'm not sure if that's a problem (I don't know if *.txt works as in cmd with Windows). So yeah, I would like to know which document contain that expression at least twice and I don't know which command is more useful for that, if grep or cat.

Thanks.

Upvotes: 0

Views: 1930

Answers (3)

stack0114106
stack0114106

Reputation: 8711

You can try with Perl as well

perl -lne ' $x++ for(/Numero de expediente/g); if($x>=2) { print $ARGV;close(ARGV);$x=0 } ' *.txt

The $x will be 0 and for every pattern match (Numero de expediente) it will be incremented, even if the pattern is appearing twice in the same line. When you have atleast 2 matches, the file handle is closed using close(ARGV) and the nextfile is read.

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133610

EDIT: As per @kent and @tripleee sir's comments I am taking care of multiple instances in a single line sum of string's occurences + if someone awk is NOT supporting nextfile I am creating a flag kind of no_processing which will simply skip lines if it is TRUE(after seeing 2 instances of string in any file).

awk 'FNR==1{count=0;no_processing=""} no_processing{next} {count+=gsub("Numero de expediente","")} count==2{print FILENAME;no_processing=1}' *.txt

OR(non-one liner form of solution)

awk '
FNR==1{
  count=0
  no_processing=""
}
no_processing{
  next
}
{
  count+=gsub("Numero de expediente","")
}
count==2{
  print FILENAME
  no_processing=1
}
' *.txt


Could you please try following, should work with GNU awk.

awk 'FNR==1{count=0} /Numero de expediente/{count++} count==2{print FILENAME " has at least 2 instances of searched string in it.";nextfile}' *.txt

Above will print eg--> test.txt has at least 2 instances of string in it. In case you want to simply print file names then try following.

awk 'FNR==1{count=0} /Numero de expediente/{count++} count==2{print FILENAME;nextfile}' *.txt

Explanation: Adding expplanation for above code now.

awk '                          ##Starting awk program here.
FNR==1{                        ##Checking condition FNR==1 which will check if this is a 1st line for any new Input_file(since we are reading multiple Input_files from awk in this code).
  count=0                      ##Setting value of variable count as ZERO here.
}                              ##Closing BLOCK for FNR condition here.
/Numero de expediente/{        ##Checking condition here if a line contains string Numero de expediente in it then do following.
  count++                      ##Incrementing variable named count value with 1 here.
}                              ##Closing BLOCK for string checking condition here.
count==2{                      ##Checking condition if variable count value is 2 then do following.
  print FILENAME               ##Printing Input_file name here, where FILENAME is out of the box awk variable contains current Input_file name in it.
  nextfile                     ##nextfile will skip current Input_file, since we got 2 instances so need NOT to read this Input_file as per OP requirement and SAVE some time here.
}                              ##Closing BLOCK for count condition here.
' *.txt                        ##Mentioning *.txt which will pass all .txt extension files to it.

Upvotes: 1

Kent
Kent

Reputation: 195179

I would add another way with grep and awk. grep is responsible for matching. awk filters out the files with matched counter>=2:

grep -o -m2 'YOUR_PATTERN' *.txt
 |awk -F: '{a[$1]++}END{for(x in a)if(a[x]>1)print x}'

Note:

  • -o works with multiple occurrences in same line case
  • -m2 will improve the performance: after hits two matches, stop processing the file.
  • awk line just builds up a hashtable, and output the filenames with match count > 1

Upvotes: 2

Related Questions