Reputation: 300
If we initialise a pandas.DataFrame
where the type will be int64
:
import numpy as np
import pandas as pd
df1 = pd.DataFrame(np.arange(4).reshape((2,2)), columns=['one','two'])
and then typecast the first column to be np.str
and look at the second row:
(1)
df2 = df1.astype({'one':np.str})
df2.loc[1,]
df2.loc[1,]
returns a pandas.Series having type object
, with the elements' types preserved.
However, if we typecast the second first column to be np.float
(2)
df3 = df1.astype({'one':np.float})
df3.loc[1,]
df3.loc[1,]
returns a pandas.Series having type float64
, i.e. the int64 in column 'two'
was promoted to float64
.
Is there a way to ensure that df.loc
always preserves type as in (1) avoid the behaviour in (2)?
(And why would I care? Because ints can be passed as indexes, floats can't, and I'm slightly annoyed of having to recast objects because pandas decided that what I wanted as return value isn't what I had put into my dataframe originally)
Upvotes: 2
Views: 792
Reputation: 323
When you combine ints and floats in a series, it will cast the ints as floats as you have discovered. One way to get around this is by setting dtype=object in your dataframe like so:
import numpy as np
import pandas as pd
df1 = pd.DataFrame(np.arange(4).reshape((2,2)), columns=['one','two'], dtype=object)
df2 = df1.astype({'one':np.str})
df2.loc[1,]
one 2
two 3
Name: 1, dtype: object
df3 = df1.astype({'one':np.float})
df3.loc[1,]
one 2
two 3
Name: 1, dtype: object
Upvotes: 1