Reputation: 69
I have like 500 text documents. In every of them the expression "Numero de expediente" appears at least once. I want to locate every file where there is at least twice. Every file has its own name, I'm not sure if that's a problem (I don't know if *.txt
works as in cmd with Windows). So yeah, I would like to know which document contain that expression at least twice and I don't know which command is more useful for that, if grep
or cat
.
Thanks.
Upvotes: 0
Views: 1930
Reputation: 8711
You can try with Perl as well
perl -lne ' $x++ for(/Numero de expediente/g); if($x>=2) { print $ARGV;close(ARGV);$x=0 } ' *.txt
The $x will be 0 and for every pattern match (Numero de expediente) it will be incremented, even if the pattern is appearing twice in the same line. When you have atleast 2 matches, the file handle is closed using close(ARGV) and the nextfile is read.
Upvotes: 1
Reputation: 133610
EDIT: As per @kent and @tripleee sir's comments I am taking care of multiple instances in a single line sum of string's occurences + if someone awk
is NOT supporting nextfile
I am creating a flag kind of no_processing
which will simply skip lines if it is TRUE(after seeing 2 instances of string in any file).
awk 'FNR==1{count=0;no_processing=""} no_processing{next} {count+=gsub("Numero de expediente","")} count==2{print FILENAME;no_processing=1}' *.txt
OR(non-one liner form of solution)
awk '
FNR==1{
count=0
no_processing=""
}
no_processing{
next
}
{
count+=gsub("Numero de expediente","")
}
count==2{
print FILENAME
no_processing=1
}
' *.txt
Could you please try following, should work with GNU awk
.
awk 'FNR==1{count=0} /Numero de expediente/{count++} count==2{print FILENAME " has at least 2 instances of searched string in it.";nextfile}' *.txt
Above will print eg--> test.txt has at least 2 instances of string in it.
In case you want to simply print file names then try following.
awk 'FNR==1{count=0} /Numero de expediente/{count++} count==2{print FILENAME;nextfile}' *.txt
Explanation: Adding expplanation for above code now.
awk ' ##Starting awk program here.
FNR==1{ ##Checking condition FNR==1 which will check if this is a 1st line for any new Input_file(since we are reading multiple Input_files from awk in this code).
count=0 ##Setting value of variable count as ZERO here.
} ##Closing BLOCK for FNR condition here.
/Numero de expediente/{ ##Checking condition here if a line contains string Numero de expediente in it then do following.
count++ ##Incrementing variable named count value with 1 here.
} ##Closing BLOCK for string checking condition here.
count==2{ ##Checking condition if variable count value is 2 then do following.
print FILENAME ##Printing Input_file name here, where FILENAME is out of the box awk variable contains current Input_file name in it.
nextfile ##nextfile will skip current Input_file, since we got 2 instances so need NOT to read this Input_file as per OP requirement and SAVE some time here.
} ##Closing BLOCK for count condition here.
' *.txt ##Mentioning *.txt which will pass all .txt extension files to it.
Upvotes: 1
Reputation: 195179
I would add another way with grep
and awk
. grep
is responsible for matching. awk
filters out the files with matched counter>=2:
grep -o -m2 'YOUR_PATTERN' *.txt
|awk -F: '{a[$1]++}END{for(x in a)if(a[x]>1)print x}'
Note:
-o
works with multiple occurrences in same line case-m2
will improve the performance: after hits two matches, stop processing the file.Upvotes: 2