johnjohn
johnjohn

Reputation: 892

Convert fractions stored as characters to float64

Say we have this df:

 df = pd.DataFrame({
            'value': ['18 4/2', '2 2/2', '8.5'],
            'country': ['USA', 'Canada', 'Switzerland']
    })

Out:

        value   country
    0   18 4/2  USA
    1   2 2/2   Canada
    2   8.5     Switzerland

Note the 'value' column stores an object type:

df.dtypes

Out:

value      object
country    object
dtype: object

My question: how do we convert 'value' to decimal, while also changing the data type to float64? Note that one value (8.5) is already a decimal, and should be kept so. Desired output:

desired_output = pd.DataFrame({
        'value': [20, 3, 8.5],
        'country': ['USA', 'Canada', 'Switzerland']
})


    value   country
0   20.0    USA
1   3.0     Canada
2   8.5     Switzerland


desired_output.dtypes

value       float64
country     object
dtype: object

Upvotes: 1

Views: 183

Answers (2)

Ben.T
Ben.T

Reputation: 29635

you can replace the space with the sign + and then apply eval

print(df['value'].str.replace(' ', '+').apply(eval))
0    20.0
1     3.0
2     8.5
Name: value, dtype: float64

or using pd.eval

df['value'] = pd.eval(df['value'].str.replace(' ', '+')).astype(float)
print(df)
  value      country
0  20.0          USA
1   3.0       Canada
2   8.5  Switzerland

Upvotes: 1

user5386938
user5386938

Reputation:

I'd go with @Ben.T's answer but since I already played around, here's my attempt.

>>> import pandas as pd
>>> df = pd.DataFrame({
...             'value': ['18 4/2', '2 2/2', '8.5'],
...             'country': ['USA', 'Canada', 'Switzerland']
...     })
>>> df
    value      country
0  18 4/2          USA
1   2 2/2       Canada
2     8.5  Switzerland
>>> def foo(s):
...     try:
...             return float(s)
...     except ValueError:
...             pass
...     w, f = s.split()
...     n, d = f.split('/')
...     w, n, d = map(int, (w, n, d))
...     return w + n / d
...
>>> foo('1')
1.0
>>> foo('18 4/2')
20.0
>>> df['value'] = df['value'].apply(foo)
>>> df
   value      country
0   20.0          USA
1    3.0       Canada
2    8.5  Switzerland
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype
---  ------   --------------  -----
 0   value    3 non-null      float64
 1   country  3 non-null      object
dtypes: float64(1), object(1)
memory usage: 176.0+ bytes
>>>

Upvotes: 1

Related Questions