Regex to match motor cycle names and extract all letters and numbers separately

Question

(\w{1,4})(?:\s{0,1})(\d{1,4})(?:\s{0,1})(\w{1,4})\s

Apologies if this is really ugly regex but I am not fluent in it at all.

I need a regex function to extract all possible combinations from motor cycle names for instance:

From a Honda CBR500R I would need to get CBR, 500 and R. I am not sure if I regex could give me CBR500 and 500R as that would be really sweet!

Some type of bike names:

Honda CBR500R
CBR 500 R
CBR 500R
CBR500 R
GS1000 S
XYZT 1000P
500ztx
KLR250 Honda
FZR 600 Suzuki
SV650
Text here XXXX 9999 XXXX 9999 XXXXX more text here

Is there a way to improve my regex? making it simpler and smarter?

Wiktor Stribiżew · Accepted Answer

You can use

([A-Z]{2,})?[\s-]*(\d+)([a-z]+)?[\s-]*([A-Z]*\b)

See the regex demo

The regex matches:

([A-Z]{2,})? - Group 1: one or zero sequence of 2 or more capital ASCII letters
[\s-]* - zero or more - or whitespace symbols
(\d+) - Group 2: one or more digits
([a-z]+)? - Group 3: one or zero sequence of one or more ASCII lowercase letters
[\s-]* - zero or more - or whitespace symbols
([A-Z]*\b) - Group 4: zero or more ASCII uppercase letters followed by a word boundary.

Here is a sample extraction code in Python:

import re
p = re.compile(r'([A-Z]{2,})?[\s-]*(\d+)([a-z]+)?[\s-]*([A-Z]*\b)')
test_str = "Honda CBR500R
CBR 500 R
CBR 500R
CBR500 R
GS1000 S
XYZT 1000P
500ztx
KLR250 Honda
FZR 600 Suzuki
Text here XXXX 9999 XXXX 9999 XXXXX more text here"
for s in p.findall(test_str):
    print("New Entry:")
    for r in s:
        if r:
            print(r)

Output:

New Entry:
CBR
500
R
New Entry:
CBR
500
R
New Entry:
CBR
500
R
New Entry:
CBR
500
R
New Entry:
GS
1000
S
New Entry:
XYZT
1000
P
New Entry:
500
ztx
New Entry:
KLR
250
New Entry:
FZR
600
New Entry:
XXXX
9999
XXXX
New Entry:
9999
XXXXX

Regex to match motor cycle names and extract all letters and numbers separately

Answers (2)

Related Questions