machineghost
machineghost

Reputation: 35790

Grep Shell Scripting: How do I Count the Number of Occurrences of Each Substring?

Stack Overflow already has some great posts about counting occurrences of a string (eg. "foo"), like this one: count all occurrences of string in lots of files with grep. However, I've been unable to find an answer to a slightly more involved variant.

Let's say I want to count how many instances of "foo:[*whatever*]*whatever else*" exist in a folder; I'd do:

grep -or 'foo:[(.*)]' * | wc -l

and I'd get back "55" (or whatever the count is). But what if I have a file like:

foo:bar abcd
foo:baz efgh
not relevant line
foo:bar xyz

and I want to get count how many instances of foo:bar vs. how many of foo:bazs, etc.? In other words, I'd like output that's something like:

bar 2
baz 1

I assume there's some way to chain greps, or use a different command from wc, but I have no idea what it is ... any shell scripting experts out there have any suggestions?

P.S. I realize that if I knew the set of possible sub-strings (ie. if I knew there was only "foo:bar" and "foo:baz") this would be simpler, but unfortunately there set of "things that can come after foo:" is unknown.

Upvotes: 3

Views: 5570

Answers (1)

Gumbo
Gumbo

Reputation: 655239

You could use sort and uniq -c:

$ grep -orE 'foo:(.*)' * | sort | uniq -c
      2 foo:bar
      1 foo:baz

Upvotes: 7

Related Questions