Reputation: 2273
I have the following dataframe:
Name Price
AG €26.74
BAE €0.25
C&C Nan
CRH €30.57
I would like to remove the euro sign from the numbers and turn the column price to numeric.
Desired output
Name Price
AG 26.74
BAE 0.25
C&C Nan
CRH 30.57
What i was thinking of is :
df['Price'].map(lambda x:str(x)[1:])
But that would modify the Nan
value , to an
, and I would like to remain the rows with the Nan values.
Also I thought :
df['Price'].split("€")[1]
but outputs
'Series' object has no attribute 'split'
Is there a better way to obtain the desired output remaining the Nan values?
Upvotes: 1
Views: 491
Reputation: 402922
Here's how I'd do it: pd.to_numeric
+ str.replace
.
df.Price = pd.to_numeric(df.Price.str.replace('[^\d.]', ''), errors='coerce')
print(df.Price)
0 26.74
1 0.25
2 NaN
3 30.57
Name: Price, dtype: float64
The regular expression [^\d.]
removes anything that is not a valid symbol in a currency value.
If you want to consider other representations (such as exponents and -ve numbers), use [^\d.e-]
instead.
Upvotes: 2
Reputation: 210932
In [27]: df.Price = pd.to_numeric(df.Price.str.replace(r'[€]', ''), errors='coerce')
In [28]: df
Out[28]:
Name Price
0 AG 26.74
1 BAE 0.25
2 C&C Group NaN
3 CRH 30.57
In [29]: df.dtypes
Out[29]:
Name object
Price float64 # <-----
dtype: object
Upvotes: 2