Reputation: 2142
I am a beginner in Python and made my first venture into Pandas today. What I want to do is to convert several columns from string to float. Here's a quick example:
import numpy as np
import pandas as pd
def convert(str):
try:
return float(str.replace(',', ''))
except:
return None
df = pd.DataFrame([
['A', '1,234', '456,789'],
['B', '1' , '---' ]
], columns=['Company Name', 'X', 'Y'])
I want to convert X
and Y
to float. The reality has more columns and I don't always know the column names for X
and Y
so I must use integer indexing.
This works:
df.iloc[:, 1] = df.iloc[:, 1].apply(convert)
df.iloc[:, 2] = df.iloc[:, 2].apply(convert)
This doesn't:
df.iloc[:, 1:2] = df.iloc[:, 1:2].apply(convert)
# Error: could not broadcast input array from shape (2) into shape (2,1)
Is there anyway to apply the convert
function on multiple columns at once?
Upvotes: 1
Views: 1301
Reputation: 164843
There are several issues with your logic:
1:2
excludes 2
, consistent with list slicing or slice
object syntax. Use 1:3
instead.pd.Series.apply
works. To apply an element-wise function to a dataframe, you need pd.DataFrame.applymap
.mystr
or x
instead of str
as a variable or argument name.try
/ except
construct, you should generally specify error type(s), in this case ValueError
.Therefore, this is one solution:
def convert(x):
try:
return float(x.replace(',', ''))
except ValueError:
return None
df.iloc[:, 1:3] = df.iloc[:, 1:3].applymap(convert)
print(df)
Company Name X Y
0 A 1234 456789
1 B 1 NaN
However, your logic is inefficient: you should look to leverage column-wise operations wherever possible. This can be achieved via pd.DataFrame.apply
, along with pd.to_numeric
applied to each series:
def convert_series(x):
return pd.to_numeric(x.str.replace(',', ''), errors='coerce')
df.iloc[:, 1:3] = df.iloc[:, 1:3].apply(convert_series)
print(df)
Company Name X Y
0 A 1234 456789
1 B 1 NaN
Upvotes: 1