Reputation: 9329
I have a file with a large number of similar strings. I want to count unique occurrences of a regex, and also show what they were, e.g. for the pattern Profile: (\w*)
on the file:
Profile: blah
Profile: another
Profile: trees
Profile: blah
I want to find that there are 3 occurrences, and return the results:
blah, another, trees
Upvotes: 6
Views: 3542
Reputation: 11703
Try this:
egrep "Profile: (\w*)" test.text -o | sed 's/Profile: \(\w*\)/\1/g' | sort | uniq
Output:
another
blah
trees
Description
egrep
with -o
option will fetch matching pattern within a file.
sed
will only fetch capturing part
sort
followed by uniq
will give a list of unique elements
To get number of elements in resultant list, append the command with wc -l
egrep "Profile: (\w*)" test.text -o | sed 's/Profile: \(\w*\)/\1/g' | sort | uniq | wc -l
Output:
3
Upvotes: 7
Reputation: 195049
awk '{a[$2]}END{for(x in a)print x}' file
will work on your example
kent$ echo "Profile: blah
Profile: another
Profile: trees
Profile: blah"|awk '{a[$2]}END{for(x in a)print x}'
another
trees
blah
if you want to have the count (3) in output:
awk '{a[$2]}END{print "count:",length(a);for(x in a)print x }' file
with same example:
kent$ echo "Profile: blah
Profile: another
Profile: trees
Profile: blah"|awk '{a[$2]}END{print "count:",length(a);for(x in a)print x }'
count: 3
another
trees
blah
Upvotes: 1