Charlie
Charlie

Reputation: 21

find all repeated pattern with regular expression

I am new to programming and try to solve this with regex, search the string for the word ' can ', for every occurrence, display the two words in front of and after it, Form example with string:

string = "CAN CAn Can cAN cAn caN can"
pattern = re.compile(r'(\S+\s+\S+)\s+can\s+(\S+\s+\S+)', re.I)
list = pattern.findall(string)
print list

expected result:

[('CAN CAn', 'cAN cAn'), ('CAn Can', 'cAn caN'), ('Can cAN', 'caN can')]

actual result:

[('CAN CAn', 'cAN cAn')]

Upvotes: 1

Views: 282

Answers (1)

vks
vks

Reputation: 67988

(?=(\b\S+\s+\S+)\s+can\s+(\S+\s+\S+\b))

Try this.See demo.

https://regex101.com/r/sJ9gM7/104#python

The problem with your regex is that once engine consumes the string it cannot go back.You will need a variable lookbehind for this which is not there in python.What actually you can do is put everything inside a lookahead so that the string does not get consumed and you can get all possible combinations.

import re
p = re.compile(r'(?=(\b\S+\s+\S+)\s+can\s+(\S+\s+\S+\b))', re.IGNORECASE)
test_str = "CAN CAn Can cAN cAn caN can"

re.findall(p, test_str)

Upvotes: 1

Related Questions