preventing pandas.DataFrame.loc from typecasting

Question

If we initialise a pandas.DataFrame where the type will be int64:

import numpy as np
import pandas as pd 

df1 = pd.DataFrame(np.arange(4).reshape((2,2)), columns=['one','two'])

and then typecast the first column to be np.str and look at the second row:

(1)

df2 = df1.astype({'one':np.str}) 
df2.loc[1,]

df2.loc[1,] returns a pandas.Series having type object, with the elements' types preserved.

However, if we typecast the second first column to be np.float

(2)

df3 = df1.astype({'one':np.float})
df3.loc[1,]

df3.loc[1,] returns a pandas.Series having type float64, i.e. the int64 in column 'two' was promoted to float64.

Is there a way to ensure that df.loc always preserves type as in (1) avoid the behaviour in (2)?

(And why would I care? Because ints can be passed as indexes, floats can't, and I'm slightly annoyed of having to recast objects because pandas decided that what I wanted as return value isn't what I had put into my dataframe originally)

ymzkala · Accepted Answer

When you combine ints and floats in a series, it will cast the ints as floats as you have discovered. One way to get around this is by setting dtype=object in your dataframe like so:

import numpy as np
import pandas as pd 

df1 = pd.DataFrame(np.arange(4).reshape((2,2)), columns=['one','two'], dtype=object)

df2 = df1.astype({'one':np.str}) 
df2.loc[1,]

one    2
two    3
Name: 1, dtype: object

df3 = df1.astype({'one':np.float})
df3.loc[1,]

one    2
two    3
Name: 1, dtype: object

Link

preventing pandas.DataFrame.loc from typecasting

Answers (1)

Related Questions