Reputation: 14836
I have the following text
text = "This is a string with C1234567 and CM123456, CM123, F1234567 and also M1234, M123456"
And I would like to extract this list of substrings
['C1234567', 'CM123456', 'F1234567']
This is what I came up with
new_string = re.compile(r'\b(C[M0-9]\d{6}|[FM]\d{7})\b')
new_string.findall(text)
However, I was wondering if there's a way to do this faster since I'm interested in performing this operation tens of thousands of times.
I thought I could use ^
to match the beginning of string, but the regex expression I came up with
new_string = re.compile(r'\b(^C[M0-9]\d{6}|^[FM]\d{7})\b')
Doesn't return anything anymore. I know this is a very basic question, but I'm not sure how to use the ^
properly.
Upvotes: 1
Views: 490
Reputation: 5274
Good and bad news. Bad news, regex looks pretty good, going to be hard to improve. Good news, I have some ideas :) I would try to do a little outside the box thinking if you are looking for performance. I do Extract Transform Load work, and a lot with Python.
Upvotes: 2