Reputation: 2273
I have this df which I got from using the read_html
property :
0 1 2 3
1 AB 16.38 16197.69 19/05
2 AC 81.48 4671.23 19/05
3 AR 12.10 3329.37 19/05
4 AS 35.69 11178.46 19/05
The second and third columns are numbers however they are recorgnised as str.
I would like to turn them into floats because on the third column I would like to carry out a division of each of the values of column 2
by its total.
The desired output would be something like this:
0 1 2 3
1 AB 16.38 0.457 19/05
2 AC 81.48 0.132 19/05
3 AR 12.10 0.094 19/05
4 AS 35.69 0.315 19/05
This is what I have tried:
On one side stating the decimals and thousands
pd.read_html('http:// whatever', flavor='html5lib', thousands='.',decimal=',')
on the other side changing the format of the df to numeric
df.apply(pd.to_numeric, errors='ignore')
When I print the desired formula over the column :
df.loc[:,2]/df.loc[:,2].sum())
The following error appears :
unsupported operand type(s) for /: 'str' and 'str'
Just would like to change the format of the column to apply above operation.
Upvotes: 2
Views: 65
Reputation: 863226
I think you need to_numeric
for convert non numeric to NaN
:
df[1] = pd.to_numeric(df[1], errors='coerce')
df[2] = pd.to_numeric(df[2], errors='coerce')
But first you can check which values are not parsed by:
print (df[pd.to_numeric(df[1], errors='coerce').isnull()])
print (df[pd.to_numeric(df[2], errors='coerce').isnull()])
Upvotes: 1