Reputation: 6172
I'm trying to write a function to sanitize unicode input in a web application, and I'm currently trying to reproduce the PHP function at the end of this page : http://www.iamcal.com/understanding-bidirectional-text/
I'm looking for an equivalent of PHP's preg_match_all in python. RE function findall returns matches without positions, and search only returns the first match. Is there any function that would return me every match, along with the associated position in the text ?
With a string abcdefa
and the pattern a|c
, I want to get something like [('a',0),('c',2),('a',6)]
Thanks :)
Upvotes: 1
Views: 2305
Reputation: 113955
I don't know of a way to get re.findall
to do this for you, but the following should work:
re.findall
to find all the matching strings.str.index
to find the associate index of all strings returned by re.findall
. However, be careful when you do this: if a string has two exact substrings in distinct locations, then re.findall
will return both, but you'll need to tell str.index
that you're looking for the second occurrence or the nth
occurrence of a string. Otherwise, it will return an index that you already have. The best way I can think of to do this would be to maintain a dictionary that has the strings from the result of re.findall
as keys and a list of indices as valuesHope this helps
Upvotes: 0
Reputation: 37909
Try:
text = 'abcdefa'
pattern = re.compile('a|c')
[(m.group(), m.start()) for m in pattern.finditer(text)]
Upvotes: 15