Reputation: 43
I am experimenting with regex and i have read up on assertions a bit and seen examples but for some reason I can not get this to work.. I am trying to get the word after the following pattern using look-behind.
import re
s = '123abc456someword 0001abde19999anotherword'
re.findall(r'(?<=\d+[a-z]+\d+)[a-z]+', s, re.I)
The results should be someword
and anotherword
But i get error: look-behind requires fixed-width pattern
Any help appreciated.
Upvotes: 4
Views: 1491
Reputation: 174786
Another easy method through lookahead,
>>> import re
>>> s = '123abc456someword 0001abde19999anotherword'
>>> m = re.findall(r'[a-z]+(?= |$)', s, re.I)
>>> m
['someword', 'anotherword']
It matches one or more alphabets in which the following character must be a space or end of a line.
Upvotes: 0
Reputation: 70732
Python's re
module only allows fixed-length strings using look-behinds. If you want to experiment and be able to use variable length look-behinds in regexes, use the alternative regex
module:
>>> import regex
>>> s = '123abc456someword 0001abde19999anotherword'
>>> regex.findall(r'(?i)(?<=\d+[a-z]+\d+)[a-z]+', s)
['someword', 'anotherword']
Or simply avoid using look-behind in general and use a capturing group ( )
:
>>> import re
>>> s = '123abc456someword 0001abde19999anotherword'
>>> re.findall(r'\d+[a-z]+\d+([a-z]+)', s, re.I)
['someword', 'anotherword']
Upvotes: 4
Reputation: 46861
Convert it to Non-capturing group and get the matched group from index 1.
(?:\d+\w+\d+)(\w+\b)
here is DEMO
If you are interested in [a-z]
only then change \w
to [a-z]
in above regex pattern. Here \b
is added to assert position at a word boundary.
sample code:
import re
p = re.compile(ur'(?:\d+\w+\d+)(\w+\b)', re.IGNORECASE)
test_str = u"123abc456someword 0001abde19999anotherword"
re.findall(p, test_str)
Upvotes: 3