user2481458
user2481458

Reputation: 31

count no of occurences of a substring in a string using bash

its giving count as 2 where as pattern occurred thrice in the string

It is giving count as 2 where as pattern occurred thrice in the string

echo "axxxaaxx" |  grep -o  "xx" | wc -l
echo "axxxaaxx" |  grep -o  "xx"

Upvotes: 1

Views: 365

Answers (2)

riteshtch
riteshtch

Reputation: 8769

grep doesnt support overlapping matching of regex. It consumes the characters which get matched. In this case you can enable Perl Compatible Regex (PCRE) using -p switch and use positive look ahead assertion like this:

$ echo "axxxaaxx" | grep -oP "x(?=x)"
x
x
x
$ echo "axxxaaxx" | grep -oP "x(?=x)" | wc -l
3
$

regex(?=regex2) Positive look ahead assertion finds all regex1 after which regex2 follows. While matching chars for regex2 it does NOT consume the chars hence that's the reason you get 3 matches.

x(?=x) Positive look ahead assertion finds all x that has x after it.

In the string xxx, 1st x matches because it has x after it, 2nd x too and 3rd x doesn't.

More info and easy examples can be found here

Upvotes: 2

Andreas Louv
Andreas Louv

Reputation: 47099

Using -P will enable PCRE which supports lookarounds:

echo "axxxaaxx" | grep -P '(?<=x)x'

In this case we are using a lookbehind which means that we will match an x which have an x before it. This makes us able to have overlapping matches:

How the regex is "evaluated":

 xxx
^^
|Cursor
Looking for x on this position, since there is nothing this will not match

 xxx
 ^^
 |Cursor
 Looking for x on this position since it's found we got a match

 xxx
  ^^
  |Cursor
  Looking for x on this position since it's found we got a match

Upvotes: 1

Related Questions