gojikyou
gojikyou

Reputation: 11

Regular expression quantifier

I have a regular expression pattern as follows:

.*\b(?P<core>[A-Z][0-9]?\b.*)(?P<extra>\b[0-9]+[xX][0-9]+.*)?\.png

To match some strings as follows:-

UI SCREEN 5-1 F2 ROUND TAB REFLECTION 224x18px.png

In Python, I get the following result

{u'core': u'F2 ROUND TAB REFLECTION 224x18px', u'extra': None}

instead of

{u'core': u'F2 ROUND TAB REFLECTION ', u'extra': u'224x18px'}

As far as I kown, regex quantifier is greedy by default in python. So I think it should work.

What am I doing wrong?

Upvotes: 0

Views: 340

Answers (3)

jdi
jdi

Reputation: 92627

Add a ? after your first greedy .*

import re
x = "UI SCREEN 5-1 F2 ROUND TAB REFLECTION 224x18px.png"
re.search(r'.*\b(?P<core>[A-Z][0-9]?\b.*?)(?P<extra>\b[0-9]+[xX][0-9]+.*)?.png', x).groups()

# OUTPUT
('F2 ROUND TAB REFLECTION ', '224x18px')

Upvotes: 1

Melug
Melug

Reputation: 1031

Could you write regular expression just like you are using? Because I can't see group name in your regex.

>>> re.match(r'(?P<core>[A-Z0-9- ]+) (?P<extra>[0-9]+[xX][0-9]+px)\.png', a).groups()
('UI SCREEN 5-1 F2 ROUND TAB REFLECTION', '224x18px')

Upvotes: 0

donkopotamus
donkopotamus

Reputation: 23206

The expression (?P[A-Z][0-9]?\b.*) probably doesn't do what you think it does ... it will match:

  • a character
  • then a number, or not
  • then a word boundary
  • then absolutely everything after that

Which swallows everything up to your terminating .png (which should be a \.png)

Upvotes: 1

Related Questions