xcrazy360
xcrazy360

Reputation: 73

Regular Expression for consecutive patterns

I have this text:

a aa aaa aaa aaaa aa aaa

And i need to catch all the aaa sequences in the text, but ignore them if there is four in a row, like aaaa. In the ideal case, I would be able to detect this:

a aa **aaa**  **aaa** aaaa aa **aaa**

Currently I have this regular expression:

[^a]aaa[^a]

This works well with the first and the last sequence 'aaa', but it can't catch the second one, since the space between aaa aaa belongs to the first pattern.

a aa **aaa**  aaa aaaa aa **aaa**

Any ideas on how to make this regex?

Upvotes: 0

Views: 211

Answers (2)

Pi Marillion
Pi Marillion

Reputation: 4674

I'll assume that you also want to catch the aaa if it's part of a sequence outside of spaces, e.g.

aaabbccaabccaccbbbaaaccbbaaaaccbbaacccaaab
^^^               ^^^                 ^^^  

In this case, a negative lookaround would be your best bet:

re.findall('(?<!a)aaa(?!a)', mystring)

(?<!a) means "not preceded by an a".

aaa matches your three as.

(?!a) means "not followed by an a".

Thus, the above only matches aaa without any additional as directly before or after the matching three.

Upvotes: 1

anubhava
anubhava

Reputation: 785068

You can use this regex:

\ba{3}\b
  • Here \b means word boundaries.
  • a{3} means match a exactly times
  • \ba{3}\b means match 3 a's that are surrounded by word boundaries hence aaaa or aaab won't be matched.

Upvotes: 5

Related Questions