JamesHudson81
JamesHudson81

Reputation: 2273

changing format of a df and removing undesired char

I have the following dataframe:

Name             Price  
AG              €26.74
BAE             €0.25
C&C             Nan
CRH             €30.57

I would like to remove the euro sign from the numbers and turn the column price to numeric.

Desired output

Name            Price   
AG              26.74
BAE             0.25
C&C             Nan
CRH             30.57

What i was thinking of is :

df['Price'].map(lambda x:str(x)[1:])

But that would modify the Nan value , to an, and I would like to remain the rows with the Nan values.

Also I thought :

df['Price'].split("€")[1]

but outputs

'Series' object has no attribute 'split'

Is there a better way to obtain the desired output remaining the Nan values?

Upvotes: 1

Views: 491

Answers (2)

cs95
cs95

Reputation: 402922

Here's how I'd do it: pd.to_numeric + str.replace.

df.Price = pd.to_numeric(df.Price.str.replace('[^\d.]', ''), errors='coerce')

print(df.Price)
0    26.74
1     0.25
2      NaN
3    30.57
Name: Price, dtype: float64

The regular expression [^\d.] removes anything that is not a valid symbol in a currency value.

If you want to consider other representations (such as exponents and -ve numbers), use [^\d.e-] instead.

Upvotes: 2

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210932

In [27]: df.Price = pd.to_numeric(df.Price.str.replace(r'[€]', ''), errors='coerce')

In [28]: df
Out[28]:
        Name  Price
0         AG  26.74
1        BAE   0.25
2  C&C Group    NaN
3        CRH  30.57

In [29]: df.dtypes
Out[29]:
Name      object
Price    float64   # <-----
dtype: object

Upvotes: 2

Related Questions