Ryan Conway
Ryan Conway

Reputation: 71

Python RegEx expression to extract numeric characters AND other special characters

I am trying to develop a regex expression that could convert something like:

"< 0.071U ug/L"

To

"<0.071"

Basically, the expression would only keep numeric values, ".", and "<".

I would like to apply this to a pandas dataframe using .extract()

This works for the floating point numbers:

r'(\d*\.?\d+)'

But I cannot figure out how to also keep "<"

Upvotes: 0

Views: 978

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627083

There is a space between < and the number, and if you do not mind this whitespace in your output, you can go on using Series.str.extract with

df['col'].str.extract(r'(<\s*\d*\.?\d+)', expand=False)

See the regex demo. Details:

  • < - a < char
  • \s* - zero or more whitespaces
  • \d* - zero or more digits
  • \.? - an optional .
  • \d+ - one or more digits.

However, you may simply remove all chars other than digits, dots and less than signs with Series.str.replace:

df['col'].str.extract(r'[^<.\d]+', '')

See this regex demo.

The [^<.\d]+ pattern is a negated character class that matches one or more chars other than <, . and digits.

Upvotes: 1

Synthaze
Synthaze

Reputation: 6090

import re

s = "< 0.071U ug/L"

print (re.sub(r"([^0-9-<.]+)","",s))

Output:

<0.071

Upvotes: 0

Related Questions