O.rka
O.rka

Reputation: 30687

Regex search to extract float from string. Python

import re

sequence = 'i have -0.03 dollars in my hand'

m = re.search('(have )(-\w[.]+)( dollars\w+)',sequence)

print m.group(0)
print m.group(1)
print m.group(2)

Looking for a way to extract text between two occurrences. In this case, the format is 'i have ' followed by - floats and then followed by ' dollars\w+'

How do i use re.search to extract this float ? Why don't the groups work this way ? I know there's something I can tweak to get it to work with these groups. any help would be greatly appreciated

I thought I could use groups with paranthesis but i got an eror

Upvotes: 0

Views: 3247

Answers (3)

IceArdor
IceArdor

Reputation: 2041

This question has already been asked in many formulations before. You're looking for a regular expression that will find a number. Since number formats may include decimals, commas, exponents, plus/minus signs, and leading zeros, you'll need a robust regular expression. Fortunately, this regular expression has already been written for you.

See How to extract a floating number from a string and Regular expression to match numbers with or without commas and decimals in text

Upvotes: 0

Tim Pietzcker
Tim Pietzcker

Reputation: 336148

Your regex doesn't match for several reasons:

  • it always requires a - (OK in this case, questionable in general)
  • it requires exactly one digit before the . (and it even allows non-digits like A).
  • it allows any number of dots, but no more digits after the dots.
  • it requires one or more alphanumerics immediately after dollars.

So it would match "I have -X.... dollarsFOO in my hand" but not "I have 0.10 dollars in my hand".

Also, there is no use in putting fixed texts into capturing parentheses.

m = re.search(r'\bhave (-?\d+\.\d+) dollars\b', sequence)

would make much more sense.

Upvotes: 2

falsetru
falsetru

Reputation: 369064

-\w[.]+ does not match -0.03 because [.] matches . literally because . is inside the [...].

\w after dollars also prevent the pattern to match the sequence. There no word character after dollars.

Use (-?\d+\.\d+) as pattern:

import re

sequence = 'i have -0.03 dollars in my hand'

m = re.search(r'(have )(-?\d+\.\d+)( dollars)', sequence)

print m.group(1) # captured group start from `1`.
print m.group(2) 
print m.group(3)

BTW, captured group numbers start from 1. (group(0) returns entire matched string)

Upvotes: 2

Related Questions