Reputation: 71
I am trying to develop a regex expression that could convert something like:
"< 0.071U ug/L"
To
"<0.071"
Basically, the expression would only keep numeric values, "."
, and "<"
.
I would like to apply this to a pandas dataframe using .extract()
This works for the floating point numbers:
r'(\d*\.?\d+)'
But I cannot figure out how to also keep "<"
Upvotes: 0
Views: 978
Reputation: 627083
There is a space between <
and the number, and if you do not mind this whitespace in your output, you can go on using Series.str.extract
with
df['col'].str.extract(r'(<\s*\d*\.?\d+)', expand=False)
See the regex demo. Details:
<
- a <
char\s*
- zero or more whitespaces\d*
- zero or more digits\.?
- an optional .
\d+
- one or more digits.However, you may simply remove all chars other than digits, dots and less than signs with Series.str.replace
:
df['col'].str.extract(r'[^<.\d]+', '')
See this regex demo.
The [^<.\d]+
pattern is a negated character class that matches one or more chars other than <
, .
and digits.
Upvotes: 1
Reputation: 6090
import re
s = "< 0.071U ug/L"
print (re.sub(r"([^0-9-<.]+)","",s))
Output:
<0.071
Upvotes: 0