Reputation: 892
Say we have this df:
df = pd.DataFrame({
'value': ['18 4/2', '2 2/2', '8.5'],
'country': ['USA', 'Canada', 'Switzerland']
})
Out:
value country
0 18 4/2 USA
1 2 2/2 Canada
2 8.5 Switzerland
Note the 'value' column stores an object type:
df.dtypes
Out:
value object
country object
dtype: object
My question: how do we convert 'value' to decimal, while also changing the data type to float64? Note that one value (8.5) is already a decimal, and should be kept so. Desired output:
desired_output = pd.DataFrame({
'value': [20, 3, 8.5],
'country': ['USA', 'Canada', 'Switzerland']
})
value country
0 20.0 USA
1 3.0 Canada
2 8.5 Switzerland
desired_output.dtypes
value float64
country object
dtype: object
Upvotes: 1
Views: 183
Reputation: 29635
you can replace
the space with the sign + and then apply
eval
print(df['value'].str.replace(' ', '+').apply(eval))
0 20.0
1 3.0
2 8.5
Name: value, dtype: float64
or using pd.eval
df['value'] = pd.eval(df['value'].str.replace(' ', '+')).astype(float)
print(df)
value country
0 20.0 USA
1 3.0 Canada
2 8.5 Switzerland
Upvotes: 1
Reputation:
I'd go with @Ben.T's answer but since I already played around, here's my attempt.
>>> import pandas as pd
>>> df = pd.DataFrame({
... 'value': ['18 4/2', '2 2/2', '8.5'],
... 'country': ['USA', 'Canada', 'Switzerland']
... })
>>> df
value country
0 18 4/2 USA
1 2 2/2 Canada
2 8.5 Switzerland
>>> def foo(s):
... try:
... return float(s)
... except ValueError:
... pass
... w, f = s.split()
... n, d = f.split('/')
... w, n, d = map(int, (w, n, d))
... return w + n / d
...
>>> foo('1')
1.0
>>> foo('18 4/2')
20.0
>>> df['value'] = df['value'].apply(foo)
>>> df
value country
0 20.0 USA
1 3.0 Canada
2 8.5 Switzerland
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 value 3 non-null float64
1 country 3 non-null object
dtypes: float64(1), object(1)
memory usage: 176.0+ bytes
>>>
Upvotes: 1