the use of regular expression

Question

I'm new in regular expression, but I want to match a pattern in about 2 million strings. There three forms of the origin strings shown as follows:

EC-2A-07
EC-2-07
T1-ZJF-4

I want to get three parts of substrings besides -, which is to say I　want to get EC, 2A, 07respectively. Especially, for the first string, I just want to divide the part before <.

I have tried .+[\d]\W, but cannot recognize EC-2-07, then I use .split('-') to split the string, and then use index in the returned list to get what I want. But it is low efficient.

Can you figure out a high efficient regular expression to meet my requirements?? Thanks a lot!

Wiktor Stribiżew · Accepted Answer

You need to use

^([A-Z0-9]{2})-([A-Z0-9]{1,3})-([A-Z0-9]{1,2})

See the regex demo

Details:

^ - start of string
([A-Z0-9]{2}) - Group 1 capturing 2 uppercase ASCII letters or digits -- - a hyphen
([A-Z0-9]{1,3}) - Group 2 capturing 1 to 3 uppercase ASCII letters or digits
- - a hyphen
([A-Z0-9]{1,2}) - Group 3 capturing 1 to 2 uppercase ASCII letters or digits.

You may adjust the values in the {min,max} quantifiers as required.

Sample Python demo:

import re
regex = r"^([A-Z0-9]{2})-([A-Z0-9]{1,3})-([A-Z0-9]{1,2})"
test_str = "EC-2A-07
EC-2-07
T1-ZJF-4"
matches = re.findall(regex, test_str, re.MULTILINE)
print(matches)
#or with lines
lines = test_str.split('
')
rx = re.compile("([A-Z0-9]{2})-([A-Z0-9]{1,3})-([A-Z0-9]{1,2})")
for line in lines:
    m = rx.match(line)
    if m:
        print('{0} :: {1} :: {2}'.format(m.group(1), m.group(2), m.group(3)))

the use of regular expression

Answers (2)

Related Questions