Pierre
Pierre

Reputation: 6172

How to get a list of character positions in Python?

I'm trying to write a function to sanitize unicode input in a web application, and I'm currently trying to reproduce the PHP function at the end of this page : http://www.iamcal.com/understanding-bidirectional-text/

I'm looking for an equivalent of PHP's preg_match_all in python. RE function findall returns matches without positions, and search only returns the first match. Is there any function that would return me every match, along with the associated position in the text ?

With a string abcdefa and the pattern a|c, I want to get something like [('a',0),('c',2),('a',6)]

Thanks :)

Upvotes: 1

Views: 2305

Answers (2)

inspectorG4dget
inspectorG4dget

Reputation: 113955

I don't know of a way to get re.findall to do this for you, but the following should work:

  1. Use re.findall to find all the matching strings.
  2. Use str.index to find the associate index of all strings returned by re.findall. However, be careful when you do this: if a string has two exact substrings in distinct locations, then re.findall will return both, but you'll need to tell str.index that you're looking for the second occurrence or the nth occurrence of a string. Otherwise, it will return an index that you already have. The best way I can think of to do this would be to maintain a dictionary that has the strings from the result of re.findall as keys and a list of indices as values

Hope this helps

Upvotes: 0

samplebias
samplebias

Reputation: 37909

Try:

text = 'abcdefa'
pattern = re.compile('a|c')
[(m.group(), m.start()) for m in pattern.finditer(text)]

Upvotes: 15

Related Questions