timmwagener
timmwagener

Reputation: 2526

Regular expression to match valid words at start, end and in the middle of a sentence

I have a particular problem with regular expressions. Consider this sentence of valid words:

sphere_a [sS]phere_b [sS]pher* [sS]pher* sph[eE]* sphere_a ^sphe* ^sp[hH]er*

I want these words to be split up, so I can use each one separately for operations downstream. To do this I am currently using 2 regular expressions.

One that matches the word at the start of the sentence:

(?<=^)(?P<pattern>[\w\^\?\*\[\]]+)(?=\s|$)

and one that matches all the others:

(?<=\s)(?P<pattern>[\w\^\?\*\[\]]+)(?=\s|$)

It would be nice to know if this could fit in one expression? It would save the looping.


Strangely enough the obvious first try:

(?<=^|\s)(?P<pattern>[\w\^\?\*\[\]]+)(?=\s|$)

fails with an error:

Invalid regular expression: look-behind requires fixed-width pattern

I am using Pythons re module, and pythex.org for validation.

Upvotes: 1

Views: 272

Answers (1)

ferdy
ferdy

Reputation: 7716

You can split your patterns easily with

regexs = 'sphere_a [sS]phere_b [sS]pher* [sS]pher* sph[eE]* sphere_a ^sphe* ^sp[hH]er*'.split(). 

Then you can iterate over the patterns like this:

for regex in regexs:
    m = re.findall(regex, content)

But it will return duplicate matches.

Upvotes: 2

Related Questions