Ando Jurai
Ando Jurai

Reputation: 1049

Re matchs storing and substituting at the same time

I have a text with multiple references "keyword1:serial numbers." that I need to change to "keyword2: serial numbers". I also need to store keyword2: number in a dict depending on the entry parsed at the time. I use a regex for substitution, and I could the parse again for substituted reference such as

import re
parser=re.compile(keyword1:(\d+?)\.)
parser2=re.compile((keyword2:\d+\W))
db={}
for entry in entries:
    parser.sub("keyword2\g<2>", entry)
    db[entry]=parser2.search(entry)

but lets face it, this is inefficient, both using 2 regexes and 2 parsing for each entry. I wonder if I can use a function to list the matches (uniquely for serial numbers), use a comprehension to add keyword2 in front of these, then store them/command the substitution.
I know finditer() will yield a list of match objects but then have not the needed functions, unless I'd go in convoluted routes to get the positions, substitute then and so on.
The problem is mainly that I want to avoid parsing two times, for a small text it is ok but on a database with hundredth thousands entries, it becomes bad design to code this such way.

Upvotes: 4

Views: 68

Answers (1)

Tamas Rev
Tamas Rev

Reputation: 7166

Can you show us some example data?

I believe we can rewrite it to use only one regex:

import re
# adding apostrophes around the regex
# also, making sure that both \. and \W are both good end-delimiters
re.compile('(keyword2:\d+(?:\.|\W))')
db={}
for entry in entries:
    db[entry]=parser.search(entry.replace('keyword1', 'keyword2'))

Upvotes: 1

Related Questions