Reputation: 4473
Here's my string:
"ab1 ab-1 f-12 g-12 ffff-123 456"
I'd like to pick out things that have:
Up to 2 numbers
Valid: ab1, ab-1, f-12, g-12
So I created the regex:
[\w{1,2}]-?\d{1,2}
But it returns too many things:
>>> re.findall('[\w{1,2}]-?\d{1,2}', "ab1 ab-1 f-12 g-12 ffff-123 456")
['b1', 'b-1', 'f-12', 'g-12', 'f-12', '456']
The problems:
[\w{1,2}]
needs to be isolated from -?
.....I think they are being stuck together[\w{1,2}]
is getting the smallest possible match e.g. b-1
from ab-1
, when it should get the largest possible match up to 2 characters, ab-1
Any ideas?
Upvotes: 1
Views: 550
Reputation: 337
This regex should look like that:
\b[a-z]{1,2}-?[\d]{1,2}\b
It's because \w
matches all alpha-numeric symbols including all the digits you don't want to find in your string.
Also there should be \b
on the boundaries of RE because of this example: ffff-123
.
RE without \b
would match the part of this example but it shouldn't so we add \b
to make it search only at the word's boundary
Upvotes: 1
Reputation: 780787
The RE should be:
[a-z]{1,2}-?\d{1,2}
The expression [\w{1,2}]
means any single character that's either a word character, {
, 1
, ,
, 2
, or }
.
Note that in your string this will match ff-12
, since this part of ffff-123
matches the expression. If you don't want this to happen you need to add \b
around the expression, so that it only matches at word boundaries.
Upvotes: 3