Naive
Naive

Reputation: 512

Match all occurrences of string using re.findall

I have a string

a = "123 some_string ABC 456 some_string DEF 789 some_string GHI"

print re.findall("(\d\d\d).*([A-Z]+)", a)

o/p : [('123', 'I')]

Expected o/p : [('123', 'ABC'), ('456', 'DEF'), ('789', 'GHI')]

Because of .* it is matching 123 and final character I. What is the proper regex, so that it prints expected o/p ?

Upvotes: 1

Views: 117

Answers (2)

Jan
Jan

Reputation: 43169

While anubhava's expression works, consider using the principle of contrast (108 steps compared to 30 steps - a reduction by more than 70%!):

(\d{3})[^A-Z]*([A-Z]+)

See the hijacked demo on regex101.com.
The lazy dot-star is very expensive in terms of performance.

Upvotes: 3

anubhava
anubhava

Reputation: 784998

Converting my comment to an answer:

You are using greedy .* that is matching first 3 digit number to very last text starting with upper case alphabet.

You should make it non-greedy (lazy):

(\d{3}).*?([A-Z]+)

RegEx Demo

Upvotes: 2

Related Questions