prashantb j
prashantb j

Reputation: 423

Python regex to extract positive and negative numbers between two special characters

Need to extract value from a string, the value can contain a comma, a decimal point, both comma and decimal point, without any of comma or decimal point, with any of comma or decimal.

For example:

1,921.15
921.15
921
1,921

re.findall(r'[-+]?\d+[,.]?\d*',st)[3]" its extracting only 1,921 but not as 1,921.15


st='["FL gr_20 PT10 MT3\'><strong>1,921.15</strong>"]'

I have tried re.findall(r'[-+]?\d+[,.]?\d*',st)[3] its extracting only 1,921 but not as 1,921.15

From below string st, using re module, I need to extract the value 1,921.15

st='["FL gr_20 PT10 MT3\'><strong>1,921.15</strong>"]'

Expected = 1,921.15
Actual = 1,921

Upvotes: 2

Views: 2492

Answers (4)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627100

In general, when you want to extract positive or negative integer or float numbers from text using Python regex, you can use the following pattern

re.findall(r'[-+]?(?:\d{1,3}(?:,\d{3})+|\d+)(?:\.\d+)?', text)

See this regex demo. Note: \d{1,3}(?:,\d{3})+ alternative matches integer numbers with comma as a thousand separator. You may adjust it to match the thousand separator you need, say, \xA0 if the thousand separator is a non-breaking space, or \. if it is a dot, etc.

Some more options will look like

re.findall(r'[-+]?\d+(?:\.\d+)?', text) # Integer part is compulsory, e.g. 5.55
re.findall(r'[-+]?\d*\.?\d+', text)     # Also matches .57 or -.76

Here, you want to extract any number in between > and < chars.

You may use

re.findall(r'>([-+]?\d[\d,.]*)<', text)

See the regex demo.

Details

  • > - a > char
  • ([-+]?\d[\d,.]*) - Group 1:
    • [-+]? - an optional - or +
    • \d - a digit
    • [\d,.]* - 0 or more digits, , or .

See the Python demo:

import re
st='''["FL gr_20 T3\'><strong>+1,921.15</strong>"]' st='["FL gr_20 T3\'><strong>-921.15</strong>"]' st='["FL gr_20 T3\'><strong>21.15</strong>"]' st='["FL gr_20 T3\'><strong>1,11,921.15</strong>"]' st='["FL gr_20 T3\'><strong>1,921</strong>"]' st='["FL gr_20 T3\'><strong>112921</strong>"]' st='["FL gr_20 T3\'><strong>1.15</strong>"]' st='["FL gr_20 T3\'><strong>1</strong>"]'''
print(re.findall(r'>([-+]?\d[\d,.]*)<', st))
# => ['+1,921.15', '-921.15', '21.15', '1,11,921.15', '1,921', '112921', '1.15', '1']

Upvotes: 1

MonkeyZeus
MonkeyZeus

Reputation: 20747

It looks like your trying to capture all of any valid number format so this would work:

[+-]?\d+(?:,\d{3})*(\.\d+)*

https://regex101.com/r/5bygVO/1

Upvotes: 0

user1717828
user1717828

Reputation: 7223

Just substitute out the commas and cast to a float:

In [1]: l = ['1,921.15', '921.15', '921', '1,921']
   ...:

In [2]: l
Out[2]: ['1,921.15', '921.15', '921', '1,921']

In [3]: [float(x.replace(',','')) for x in l]
Out[3]: [1921.15, 921.15, 921.0, 1921.0]

If you really want to get rid of .0s, use is_integer() to cast only whole numbers:

In [4]: [int(f) if f.is_integer() else f for f in [float(x.replace(',','')) for x in l]]
Out[4]: [1921.15, 921.15, 921, 1921]

Upvotes: 0

luis.parravicini
luis.parravicini

Reputation: 1227

Your regexp doesnt take into account when a number has ',' and '.' You could use the below regexp to match all cases:

re.findall(r'[-+]?\d+(?:,\d+)?(?:\.\d+)?'

Upvotes: 0

Related Questions