Elias Mi
Elias Mi

Reputation: 711

Avoid type casting when selecting a row

I've got a DataFrame with enforced data types, which are quite important to my application:

df = (pd.DataFrame([(1, 1, 1000), 
                    (1, 2, 2000)], 
                   columns=['id', 'fk', 'value'])
      .astype({'id': pd.Int32Dtype(), 
               'fk': pd.Int32Dtype(), 
               'value': pd.Float32Dtype()})

df.dtypes.to_dict()

correctly yields:

{'id': Int32Dtype(), 'fk': Int32Dtype(), 'value': Float32Dtype()}

However, when I pick one row using .iloc, Pandas suddently casts everything into float -- presumably because it turns it into a Series which wants a data type:

df.iloc[0].dtypes

yields:

Float64Dtype()

That causes downstream problems, as I need the data in the correct types. How can I pull out a single row while maintaining the correct types?

Upvotes: 2

Views: 582

Answers (2)

Elias Mi
Elias Mi

Reputation: 711

As per comments by @MichaelSzczesny and @Corralien (thank you!), I understand it's not possible to have a Series with multiple types, but a DataFrame with one row (e.g. df.iloc[[0]]) keeps the data types.

I've now used that to extract the data into a dict instead, keeping the base data types (int, float etc.), which is good enough for my use case:

df.iloc[[0]].to_dict('records')[0]

yields:

{'id': 1, 'fk': 1, 'value': 1000.0}

Upvotes: 0

Corralien
Corralien

Reputation: 120469

You want to extract a Series (one row) from a DataFrame:

>>> df
   id  fk   value
0   1   1  1000.0  # Int32, Int32, Float32
1   1   2  2000.0

>>> df.iloc[0]
id          1.0
fk          1.0
value    1000.0
Name: 0, dtype: Float64

So you have 2 rows of Int32 and 1 row of Float32. However, it's not possible to mix dtypes for a Series (or a column of DataFrame). Pandas have to cast your Series into a common dtype that fits your values. Here, Float64.

Now a different case:

df = pd.DataFrame([(1, 1, 1000), (1, 2, 2000)], columns=['id', 'fk', 'value']) \
     .astype({'id': pd.Int8Dtype(), 'fk': pd.Int16Dtype(), 'value': pd.Int32Dtype()})
>>> df.dtypes
id        Int8
fk       Int16
value    Int32
dtype: object

>>> df.iloc[0]
id          1
fk          1
value    1000
Name: 0, dtype: Int32

In this case, Pandas finds a common dtype (a superset) to enclose values.

Upvotes: 1

Related Questions