Reputation: 512
I have a string
a = "123 some_string ABC 456 some_string DEF 789 some_string GHI"
print re.findall("(\d\d\d).*([A-Z]+)", a)
o/p : [('123', 'I')]
Expected o/p : [('123', 'ABC'), ('456', 'DEF'), ('789', 'GHI')]
Because of .*
it is matching 123
and final character I
.
What is the proper regex, so that it prints expected o/p ?
Upvotes: 1
Views: 117
Reputation: 43169
While anubhava's expression works, consider using the principle of contrast (108 steps compared to 30 steps - a reduction by more than 70%!):
(\d{3})[^A-Z]*([A-Z]+)
See the hijacked demo on regex101.com.
The lazy dot-star is very expensive in terms of performance.
Upvotes: 3
Reputation: 784998
Converting my comment to an answer:
You are using greedy .*
that is matching first 3 digit number to very last text starting with upper case alphabet.
You should make it non-greedy (lazy):
(\d{3}).*?([A-Z]+)
Upvotes: 2