Reputation: 77
Suppose we have the following dataframe.
df = pd.DataFrame([['1','2'], ['3', '4']])
If we look at dtypes, all the numbers are objects:
df.dtypes
0 object
1 object
I want to have them as int or float. So if I do
df_apply = df.apply(pd.to_numeric)
I get
df_apply.dtypes
0 int64
1 int64
dtype: object
Same if I do
df_astype_int = df.astype(int)
df_astype_int.dtypes
0 int64
1 int64
Or
df_astype_float = df.astype(float)
df_astype_float.dtypes
0 float64
1 float64
dtype: object
Or
df_loop = df.copy()
for i in df_loop:
df_loop[i] = pd.to_numeric(df_loop[i])
df_loop.dtypes
0 int64
1 int64
But if I do
df_loop = df.copy()
for index in df_loop.index:
df_loop.loc[index,:] = pd.to_numeric(df_loop.loc[index,:])
df_loop.dtypes
0 object
1 object
dtype: object
I get objects again. Why is that? It doesn't raise any errors, but doesn't convert either.
Upvotes: 1
Views: 252
Reputation: 22493
In the below:
for index in df_loop.index:
df_loop.loc[index,:] = pd.to_numeric(df_loop.loc[index,:])
You are trying to assign a row of Series
with dtype int
. The result got upcasted to object
which is the common NumPy dtype of all types column-wise, since pandas
is column based. For added info, see numpy.find_common_type.
But when you convert the dtype of the whole DataFrame or a column, it would work fine just like the other samples you provided.
Upvotes: 1