Reputation: 1098
Problem:
I have a Pandas.DataFrame which stores only unicode values. Each column contains values that could be converted to either an integer or float, or left as unicode. (Python version 2.7.15, Pandas version 0.23.0)
df = pd.DataFrame({'x':[u'1', u'1.23', u'', u'foo_text'], 'y':[u'bar_text', u'', u'2', u'4.56']})
print df
x y
0 1 bar_text
1 1.23
2 2
3 foo_text 4.56
I would like to convert the type of each cell as follows:
Solution attempts:
The following code does precisely what I want:
type_list = [int, float, unicode]
for column in df.columns:
for index in df.index:
for desired_type in type_list:
try:
df.loc[index,column] = desired_type(df.loc[index,column])
break
except ValueError:
pass
The problem is that my actual DataFrame is >10 million cells, and this will way too long to execute. I am trying to find a faster way to do this.
I have looked at pandas.DataFrame.infer_objects()
, and pandas.to_numeric()
, but neither appear to handle the case of mixed types within a column.
Upvotes: 0
Views: 4249
Reputation: 124
Try using a function along with .apply()
which will be a lot faster than three nested for-loops.
So something like:
def change_dtype(value):
try:
return int(value)
except ValueError:
try:
return float(value)
except ValueError:
return value
for column in df.columns:
df.loc[:, column] = df[column].apply(change_dtype)
Upvotes: 3