daiyue
daiyue

Reputation: 7448

how to replace non-numeric chars using regex

I am wondering how to use regex remove any non-numeric chars while only selecting non-empty and spaces (a single value may contain one or multiple spaces) values for a series in a more efficient way,

df['numeric_no'] = df['id'].apply(lambda x: re.sub("[^0-9]", "", x))
df = df[(df['numeric_no'] != '') & (df['numeric_no'] != ' ')]

some sample data for the df

numeric_no
B-27000
44-11-E
LAND-11-4
17772A
88LL9A
321LP-3
UNIT 9 CAM -00-12
WWcard_055_34QE
EE119.45
aaa
b  b

the result will look like

numeric_no
27000
4411
114
17772
889
3213
90012
05534
119.45

Upvotes: 0

Views: 62

Answers (3)

jezrael
jezrael

Reputation: 862481

I believe need str.findall with boolean indexing:

s = df['numeric_no'].str.findall("(\d*\.\d+|\d+)").str.join('')

s = s[s.astype(bool)]
print (s)

0     27000
1      4411
2       114
3     17772
4       889
5      3213
6     90012
7     05534
8    119.45
Name: numeric_no, dtype: object

Upvotes: 1

Scott Boston
Scott Boston

Reputation: 153460

I think can try:

df.numeric_no.str.extractall('(\d+?[\.\d+])').astype(str).sum(level=0)

Output:

        0
0    2700
1    4411
2      11
3    1777
4      88
5      32
6    0012
7    0534
8  119.45

Upvotes: 1

revo
revo

Reputation: 48711

You could match and capture numbers and match any thing else:

(\d+(?:\.\d+)?)|.

Live demo

Then replace match with $1 (a back-reference to first capturing group)

Python code:

re.sub(r"(\d+(?:\.\d+)?)|.", "$1", x) 

Upvotes: 1

Related Questions