Reputation: 2526
I have a particular problem with regular expressions. Consider this sentence of valid words:
sphere_a [sS]phere_b [sS]pher* [sS]pher* sph[eE]* sphere_a ^sphe* ^sp[hH]er*
I want these words to be split up, so I can use each one separately for operations downstream. To do this I am currently using 2 regular expressions.
One that matches the word at the start of the sentence:
(?<=^)(?P<pattern>[\w\^\?\*\[\]]+)(?=\s|$)
and one that matches all the others:
(?<=\s)(?P<pattern>[\w\^\?\*\[\]]+)(?=\s|$)
It would be nice to know if this could fit in one expression? It would save the looping.
Strangely enough the obvious first try:
(?<=^|\s)(?P<pattern>[\w\^\?\*\[\]]+)(?=\s|$)
fails with an error:
Invalid regular expression: look-behind requires fixed-width pattern
I am using Pythons re module, and pythex.org for validation.
Upvotes: 1
Views: 272
Reputation: 7716
You can split your patterns easily with
regexs = 'sphere_a [sS]phere_b [sS]pher* [sS]pher* sph[eE]* sphere_a ^sphe* ^sp[hH]er*'.split().
Then you can iterate over the patterns like this:
for regex in regexs:
m = re.findall(regex, content)
But it will return duplicate matches.
Upvotes: 2