JamesHudson81
JamesHudson81

Reputation: 2273

Struggling very much to turn the format of a dataframe from strings to floats

I have this df which I got from using the read_html property :

              0      1         2        3 

1             AB   16.38   16197.69  19/05
2             AC   81.48    4671.23  19/05
3             AR   12.10    3329.37  19/05
4             AS   35.69   11178.46  19/05

The second and third columns are numbers however they are recorgnised as str.

I would like to turn them into floats because on the third column I would like to carry out a division of each of the values of column 2 by its total.

The desired output would be something like this:

     0          1         2       3 

1   AB      16.38    0.457    19/05
2   AC      81.48    0.132    19/05
3   AR      12.10    0.094    19/05
4   AS      35.69    0.315    19/05

This is what I have tried:

On one side stating the decimals and thousands

pd.read_html('http:// whatever', flavor='html5lib', thousands='.',decimal=',')

on the other side changing the format of the df to numeric

df.apply(pd.to_numeric, errors='ignore')

When I print the desired formula over the column :

df.loc[:,2]/df.loc[:,2].sum())

The following error appears :

unsupported operand type(s) for /: 'str' and 'str'

Just would like to change the format of the column to apply above operation.

Upvotes: 2

Views: 65

Answers (1)

jezrael
jezrael

Reputation: 863226

I think you need to_numeric for convert non numeric to NaN:

df[1] = pd.to_numeric(df[1], errors='coerce')
df[2] = pd.to_numeric(df[2], errors='coerce')

But first you can check which values are not parsed by:

print (df[pd.to_numeric(df[1], errors='coerce').isnull()])

print (df[pd.to_numeric(df[2], errors='coerce').isnull()])

Upvotes: 1

Related Questions