Stefan
Stefan

Reputation: 9329

Linux tools - how to count and list occurrences of regex in file

I have a file with a large number of similar strings. I want to count unique occurrences of a regex, and also show what they were, e.g. for the pattern Profile: (\w*) on the file:

Profile: blah
Profile: another
Profile: trees
Profile: blah

I want to find that there are 3 occurrences, and return the results:

blah, another, trees

Upvotes: 6

Views: 3542

Answers (2)

jkshah
jkshah

Reputation: 11703

Try this:

egrep "Profile: (\w*)" test.text -o | sed 's/Profile: \(\w*\)/\1/g' | sort | uniq

Output:

another
blah
trees

Description

egrep with -o option will fetch matching pattern within a file.

sed will only fetch capturing part

sort followed by uniq will give a list of unique elements

To get number of elements in resultant list, append the command with wc -l

egrep "Profile: (\w*)" test.text -o | sed 's/Profile: \(\w*\)/\1/g' | sort | uniq | wc -l

Output:

3

Upvotes: 7

Kent
Kent

Reputation: 195049

awk '{a[$2]}END{for(x in a)print x}' file

will work on your example

kent$  echo "Profile: blah
Profile: another
Profile: trees
Profile: blah"|awk '{a[$2]}END{for(x in a)print x}'
another
trees
blah

if you want to have the count (3) in output:

awk '{a[$2]}END{print "count:",length(a);for(x in a)print x }' file

with same example:

kent$  echo "Profile: blah
Profile: another
Profile: trees
Profile: blah"|awk '{a[$2]}END{print "count:",length(a);for(x in a)print x }'
count: 3
another
trees
blah

Upvotes: 1

Related Questions