Reputation: 1680
I am trying to extract all numbers including decimals, dots and commas form a string using pandas.
This is my DataFrame
rate_number
0 92 rate
0 33 rate
0 9.25 rate
0 (4,396 total
0 (2,620 total
I tried using df['rate_number'].str.extract('(\d+)', expand=False)
but the results were not correct.
The DataFrame I need to extract should be the following:
rate_number
0 92
0 33
0 9.25
0 4,396
0 2,620
Upvotes: 0
Views: 1297
Reputation: 51
There is a small error with the asterisk's position:
df['rate_number_2'] = df['rate_number'].str.extract('([0-9]*[,.][0-9]*)')
Upvotes: 1
Reputation: 1175
Dan's comment above is not very noticeable but worked for me:
for df in df_arr:
df = df.astype(str)
df_copy = df.copy()
for i in range(1, len(df.columns)):
df_copy[df.columns[i]]=df_copy[df.columns[i]].str.extract('(\d+[.]?\d*)', expand=False) #replace(r'[^0-9]+','')
new_df_arr.append(df_copy)
Upvotes: 0
Reputation: 7594
You can try this:
df['rate_number'] = df['rate_number'].replace('\(|[a-zA-Z]+', '', regex=True)
Better answer:
df['rate_number_2'] = df['rate_number'].str.extract('([0-9][,.]*[0-9]*)')
Output:
rate_number rate_number_2
0 92 92
1 33 33
2 9.25 9.25
3 4,396 4,396
4 2,620 2,620
Upvotes: 2