Reputation: 711
I've got a DataFrame with enforced data types, which are quite important to my application:
df = (pd.DataFrame([(1, 1, 1000),
(1, 2, 2000)],
columns=['id', 'fk', 'value'])
.astype({'id': pd.Int32Dtype(),
'fk': pd.Int32Dtype(),
'value': pd.Float32Dtype()})
df.dtypes.to_dict()
correctly yields:
{'id': Int32Dtype(), 'fk': Int32Dtype(), 'value': Float32Dtype()}
However, when I pick one row using .iloc
, Pandas suddently casts everything into float -- presumably because it turns it into a Series which wants a data type:
df.iloc[0].dtypes
yields:
Float64Dtype()
That causes downstream problems, as I need the data in the correct types. How can I pull out a single row while maintaining the correct types?
Upvotes: 2
Views: 582
Reputation: 711
As per comments by @MichaelSzczesny and @Corralien (thank you!), I understand it's not possible to have a Series with multiple types, but a DataFrame with one row (e.g. df.iloc[[0]]
) keeps the data types.
I've now used that to extract the data into a dict
instead, keeping the base data types (int
, float
etc.), which is good enough for my use case:
df.iloc[[0]].to_dict('records')[0]
yields:
{'id': 1, 'fk': 1, 'value': 1000.0}
Upvotes: 0
Reputation: 120469
You want to extract a Series
(one row) from a DataFrame
:
>>> df
id fk value
0 1 1 1000.0 # Int32, Int32, Float32
1 1 2 2000.0
>>> df.iloc[0]
id 1.0
fk 1.0
value 1000.0
Name: 0, dtype: Float64
So you have 2 rows of Int32
and 1 row of Float32
. However, it's not possible to mix dtypes for a Series
(or a column of DataFrame
). Pandas have to cast your Series
into a common dtype that fits your values. Here, Float64
.
Now a different case:
df = pd.DataFrame([(1, 1, 1000), (1, 2, 2000)], columns=['id', 'fk', 'value']) \
.astype({'id': pd.Int8Dtype(), 'fk': pd.Int16Dtype(), 'value': pd.Int32Dtype()})
>>> df.dtypes
id Int8
fk Int16
value Int32
dtype: object
>>> df.iloc[0]
id 1
fk 1
value 1000
Name: 0, dtype: Int32
In this case, Pandas finds a common dtype (a superset) to enclose values.
Upvotes: 1