Reputation: 73
I have this text:
a aa aaa aaa aaaa aa aaa
And i need to catch all the aaa
sequences in the text, but ignore them if there is four in a row, like aaaa
. In the ideal case, I would be able to detect this:
a aa **aaa** **aaa** aaaa aa **aaa**
Currently I have this regular expression:
[^a]aaa[^a]
This works well with the first and the last sequence 'aaa', but it can't catch the second one, since the space between aaa aaa
belongs to the first pattern.
a aa **aaa** aaa aaaa aa **aaa**
Any ideas on how to make this regex?
Upvotes: 0
Views: 211
Reputation: 4674
I'll assume that you also want to catch the aaa if it's part of a sequence outside of spaces, e.g.
aaabbccaabccaccbbbaaaccbbaaaaccbbaacccaaab
^^^ ^^^ ^^^
In this case, a negative lookaround would be your best bet:
re.findall('(?<!a)aaa(?!a)', mystring)
(?<!a)
means "not preceded by an a
".
aaa
matches your three a
s.
(?!a)
means "not followed by an a
".
Thus, the above only matches aaa
without any additional a
s directly before or after the matching three.
Upvotes: 1
Reputation: 785068
You can use this regex:
\ba{3}\b
\b
means word boundaries.a{3}
means match a
exactly times\ba{3}\b
means match 3 a's that are surrounded by word boundaries hence aaaa
or aaab
won't be matched.Upvotes: 5