LittleBobbyTables
LittleBobbyTables

Reputation: 4473

Regex to match as many times as possible, but within constraints

Here's my string:

"ab1 ab-1 f-12 g-12 ffff-123 456"

I'd like to pick out things that have:

So I created the regex:

[\w{1,2}]-?\d{1,2}

But it returns too many things:

>>> re.findall('[\w{1,2}]-?\d{1,2}', "ab1 ab-1 f-12 g-12 ffff-123 456")
['b1', 'b-1', 'f-12', 'g-12', 'f-12', '456']

The problems:

  1. [\w{1,2}] needs to be isolated from -?.....I think they are being stuck together
  2. [\w{1,2}] is getting the smallest possible match e.g. b-1 from ab-1, when it should get the largest possible match up to 2 characters, ab-1

Any ideas?

Upvotes: 1

Views: 550

Answers (2)

43l0v3k
43l0v3k

Reputation: 337

This regex should look like that:

\b[a-z]{1,2}-?[\d]{1,2}\b

It's because \w matches all alpha-numeric symbols including all the digits you don't want to find in your string.

Also there should be \b on the boundaries of RE because of this example: ffff-123.

RE without \b would match the part of this example but it shouldn't so we add \b to make it search only at the word's boundary

Upvotes: 1

Barmar
Barmar

Reputation: 780787

The RE should be:

[a-z]{1,2}-?\d{1,2}

The expression [\w{1,2}] means any single character that's either a word character, {, 1, ,, 2, or }.

Note that in your string this will match ff-12, since this part of ffff-123 matches the expression. If you don't want this to happen you need to add \b around the expression, so that it only matches at word boundaries.

Upvotes: 3

Related Questions