Reputation: 2121
I am trying to write a shell script that will search for a regular expression in each of the files in the current directory without using temp files.
Originally, I did this using a temp file to store echo * | sed 's/ /\n/g'
and then looped through each line of this file, using cat
on each and then grepping my expression and counting the lines of output. I was having some trouble with temp files being searched and was wondering if I could do everything using variables or some non-temp-files method (I don't really want to create a separate directory for the temp files either).
The problem I was having with variables was that after I had set the value of the variable to the output of echo * | sed 's/ /\n/g'
, I didn't know how to loop through each line so I could get the expression count from the files.
I just want the following to work (where I hardcode the expression):
% ls
% file1 file2 file3
% ./countMost.sh
% file2(28)
% ls
% file1 file2 file3
signifying that file2 has the most instances of the expression (28 of them).
Upvotes: 1
Views: 86
Reputation: 14390
This should give you the top ten most common lowercase words (you change change the regex to whatever) in for a bunch files inside a dir called test with counts.
grep -rhoE "[a-z]+" test | sort | uniq -c | sort -r | head
3 test
2 wow
2 what
2 oh
2 foo
2 bar
1 ham
If you want the count by filename, then remove the h flag on grep
grep -roE "[a-z]+" test | sort | uniq -c | sort -r | head
3 test/2:test
1 test/2:wow
1 test/2:what
1 test/2:oh
1 test/2:foo
1 test/2:bar
1 test/1:wow
1 test/1:what
1 test/1:oh
1 test/1:ham
Upvotes: 0
Reputation: 2670
A similar version of Job Lin solution uses sort args instead of sed:
grep -c -e "^d" file* | sort -n -k2 -t: -r |head -1
(here I look for lines starting with a 'd')
Upvotes: 1
Reputation: 143906
You can try something like this:
grep -c regex files | sed -e 's/^\(.*\):\(.*\)$/\2 \1/' | sort -r -n | head -n 1
Where regex
is your regular expression (can use egrep
as well) and the files
are your list of files.
Given 3 files:
file1:
qwe
qwe
qwe
asd
zxc
file2:
qwe
asd
zxc
file3:
asd
qwe
qwe
qwe
qwe
and I run:
grep -c 'qwe' file[1-3] | sed -e 's/^\(.*\):\(.*\)$/\2 \1/' | sort -r -n
I get the output:
4 file3
3 file1
1 file2
Additionally, adding the | head -n 1
at the end only gives me:
4 file3
Upvotes: 2