Reputation: 423
Need to extract value from a string, the value can contain a comma, a decimal point, both comma and decimal point, without any of comma or decimal point, with any of comma or decimal.
For example:
1,921.15
921.15
921
1,921
re.findall(r'[-+]?\d+[,.]?\d*',st)[3]" its extracting only 1,921 but not as 1,921.15
st='["FL gr_20 PT10 MT3\'><strong>1,921.15</strong>"]'
I have tried re.findall(r'[-+]?\d+[,.]?\d*',st)[3]
its extracting only 1,921 but not as 1,921.15
From below string st, using re module, I need to extract the value 1,921.15
st='["FL gr_20 PT10 MT3\'><strong>1,921.15</strong>"]'
Expected = 1,921.15
Actual = 1,921
Upvotes: 2
Views: 2492
Reputation: 627100
In general, when you want to extract positive or negative integer or float numbers from text using Python regex, you can use the following pattern
re.findall(r'[-+]?(?:\d{1,3}(?:,\d{3})+|\d+)(?:\.\d+)?', text)
See this regex demo. Note: \d{1,3}(?:,\d{3})+
alternative matches integer numbers with comma as a thousand separator. You may adjust it to match the thousand separator you need, say, \xA0
if the thousand separator is a non-breaking space, or \.
if it is a dot, etc.
Some more options will look like
re.findall(r'[-+]?\d+(?:\.\d+)?', text) # Integer part is compulsory, e.g. 5.55
re.findall(r'[-+]?\d*\.?\d+', text) # Also matches .57 or -.76
Here, you want to extract any number in between >
and <
chars.
You may use
re.findall(r'>([-+]?\d[\d,.]*)<', text)
See the regex demo.
Details
>
- a >
char([-+]?\d[\d,.]*)
- Group 1:
[-+]?
- an optional -
or +
\d
- a digit[\d,.]*
- 0 or more digits, ,
or .
See the Python demo:
import re
st='''["FL gr_20 T3\'><strong>+1,921.15</strong>"]' st='["FL gr_20 T3\'><strong>-921.15</strong>"]' st='["FL gr_20 T3\'><strong>21.15</strong>"]' st='["FL gr_20 T3\'><strong>1,11,921.15</strong>"]' st='["FL gr_20 T3\'><strong>1,921</strong>"]' st='["FL gr_20 T3\'><strong>112921</strong>"]' st='["FL gr_20 T3\'><strong>1.15</strong>"]' st='["FL gr_20 T3\'><strong>1</strong>"]'''
print(re.findall(r'>([-+]?\d[\d,.]*)<', st))
# => ['+1,921.15', '-921.15', '21.15', '1,11,921.15', '1,921', '112921', '1.15', '1']
Upvotes: 1
Reputation: 20747
It looks like your trying to capture all of any valid number format so this would work:
[+-]?\d+(?:,\d{3})*(\.\d+)*
https://regex101.com/r/5bygVO/1
Upvotes: 0
Reputation: 7223
Just substitute out the commas and cast to a float:
In [1]: l = ['1,921.15', '921.15', '921', '1,921']
...:
In [2]: l
Out[2]: ['1,921.15', '921.15', '921', '1,921']
In [3]: [float(x.replace(',','')) for x in l]
Out[3]: [1921.15, 921.15, 921.0, 1921.0]
If you really want to get rid of .0
s, use is_integer()
to cast only whole numbers:
In [4]: [int(f) if f.is_integer() else f for f in [float(x.replace(',','')) for x in l]]
Out[4]: [1921.15, 921.15, 921, 1921]
Upvotes: 0
Reputation: 1227
Your regexp doesnt take into account when a number has ',' and '.' You could use the below regexp to match all cases:
re.findall(r'[-+]?\d+(?:,\d+)?(?:\.\d+)?'
Upvotes: 0