bioslime
bioslime

Reputation: 1961

How to fix my nonworking Python regex match?

I want to grab the whole number out of this string <some>some 344.3404.3 numbers<tag>.

Using the Pythex emulator website this works with [\d\.]* (a digit or point repeated zero or more times). In Python i get back the whole string:

Input:

import re
re.match(r'[\d\.]*', '<some>some 344.3404.3 numbers<tag>').string

Output:

'<some>some 344.3404.3 numbers<tag>'

What am i missing?

Running python 3.3.5, win7, 64bit.

Upvotes: 1

Views: 46

Answers (2)

Tim Pietzcker
Tim Pietzcker

Reputation: 336118

The string attribute of a regex match object contains the input string of the match, not the matched content.

If you want the (first) matching part, you need to change three things:

  • use re.search() because re.match() will only find a match at the start of the string,
  • access the group() method of the match object,
  • use + instead of * or you'll get an empty (zero-length) match unless the match happens to be at the start of the string.

Therefore, use

>>> re.search(r'[\d.]+', '<some>some 344.3404.3 numbers<tag>').group()
'344.3404.3'

or

>>> re.findall(r'[\d.]+', '<some>some 344.3404.3 numbers more 234.432<tag>')
['344.3404.3', '234.432']

if you expect more than one match.

Upvotes: 2

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89547

You can use this:

re.search(r'[\d.]+', '<some>some 344.3404.3 numbers<tag>').group()

Notes: Your pattern didn't work because [\d.]* will match the empty string at the first position. This is why I have replaced the quantifier with + and changed the method from match to search.

There is no need to escape the dot inside a character class, since it is seen by default as a literal character.

Upvotes: 2

Related Questions