Reputation: 1181
I need an automated reliable way to find the data type of each column in a pandas data frame. I have been using .dtype() but have noticed something unexpected with it.
Consider this 10 row data frame:
df['a']
Out[6]:
0 250.00
1 750.00
2 0.00
3 0.00
4 0.00
5 0.00
6 0.00
7 0.00
8 0.00
9 0.00
Name: a, dtype: object
type(df['a'][0])
Out[9]: decimal.Decimal
Why is the dtype of the entire column an 'object' when each entry is a decimal? I really need it to say decimal or float or something numeric. Any help would be appreciated!
Upvotes: 4
Views: 2836
Reputation: 394101
This is not an error but is due to the numpy
dtype
representation: https://docs.scipy.org/doc/numpy/reference/arrays.scalars.html.
Basically as Decimal
is not a principle inbuilt type then it's dtype ends up being object
even though the actual type of each cell is still Decimal
.
It's advised where possible to use the inbuilt scalar types, in this case float64
, because arithmetic operations are unlikely to be vectorised even though the type may well be numerical.
The same is observed when you store str
or datetime.date
values, the dtype is object
for these.
Upvotes: 8