JamesHudson81
JamesHudson81

Reputation: 2273

change format of string values to numeric in df

I have the following df where some values in the df are strings (those with %) while other ones aren't.

                          test    overall
Quents Ratio            270.01%  256.02%
Amount sulphur            0.17     0.19
Amount salt                  -    20.89
amount silica             4.29%    6.84%

I would like to make all the values numeric given that I would like to carry out some analysis among the 2 columns.

Desired output:

                          test    overall
Quents Ratio            270.01   256.02
Amount sulphur            0.17     0.19
Amount salt                  -    20.89
amount silica             4.29     6.84

What I have tried is to:

def numeric_df(df):
    df_detail=df.loc[['Quents Ratio','amount silica'],:]
    df_detail= df_detail.apply(lambda x:str(x)[:-1])
    return df

But returns same initial df.

How could I obtain the desired output?

Upvotes: 0

Views: 70

Answers (1)

jezrael
jezrael

Reputation: 862441

I think you need replace, but values contains also -, so impossible convert to numeric:

 df = df.replace('%', '', regex=True)

If need all values numeric and values contains only - chars:

df = df.replace({'%': '', '^-$':np.nan}, regex=True).astype(float)
print (df)
                  test  overall
Quents Ratio    270.01   256.02
Amount sulphur    0.17     0.19
Amount salt        NaN    20.89
amount silica     4.29     6.84

Another solution with to_numeric - it replace all non numeric to NaNs too:

df = df.replace('%', '', regex=True).apply(pd.to_numeric, errors='coerce')
print (df)
                  test  overall
Quents Ratio    270.01   256.02
Amount sulphur    0.17     0.19
Amount salt        NaN    20.89
amount silica     4.29     6.84

Upvotes: 1

Related Questions