Blue Otter Hat
Blue Otter Hat

Reputation: 617

Python Regular Expression -- not matching digits at end of string

This will be really quick marks for someone...

Here's my string:

Jan 13.BIGGS.04222 ABC DMP 15

I'm looking to match:

  1. the date at the front (mmm yy) format
  2. the name in the second field
  3. the digits at the end. There could be between one and three.

Here is what I have so far:

(\w{3} \d{2})\.(\w*)\..*(\d{1,3})$

Through a lot of playing around with http://www.pythonregex.com/ I can get to matching the '5', but not '15'.

What am I doing wrong?

Upvotes: 2

Views: 1902

Answers (3)

Tadeck
Tadeck

Reputation: 137310

Alternatively to what @unutbu has proposed, you can also use word boundary \b - this matches "word border":

(\w{3} \d{2})\.(\w*)\..*\b(\d{1,3})$

From the site you referred:

>>> regex = re.compile("(\w{3} \d{2})\.(\w*)\..*\b(\d{1,3})$")
>>> regex.findall('Jan 13.BIGGS.04222 ABC DMP 15')
[(u'Jan 13', u'BIGGS', u'15')]

Upvotes: 2

kirilloid
kirilloid

Reputation: 14304

.* before numbers are greedy and match as much as it can, leaveing least possible digits to the last block. You either need to make it non-greedy (with ? like unutbu said) or make it do not match digits, replacing . with \D

Upvotes: 1

unutbu
unutbu

Reputation: 879321

Use .*? to match .* non-greedily:

In [9]: re.search(r'(\w{3} \d{2})\.(\w*)\..*?(\d{1,3})$', text).groups()
Out[9]: ('Jan 13', 'BIGGS', '15')

Without the question mark, .* matches as many characters as possible, including the digit you want to match with \d{1,3}.

Upvotes: 6

Related Questions