Reputation: 83
I'm working on a Windows system with a 64-bit version of Python (Python 3.10.13, packaged by Anaconda, Inc.). When I run Python, the header indicates that it's a 64-bit environment: "Python 3.10.13 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:24:38) [MSC v.1916 64 bit (AMD64)] on win32". I am not sure why it shows win32, since I have a win64 machine
I've verified the bitness through various methods. First using the following code, which correctly shows that I'm running a 64-bit Python environment.:
import platform
import sys
assert platform.architecture()[0] == "64bit"
assert sys.maxsize > 2**32
Next, I checked my conda version with conda info
, which gives platform : win-64
.
However, when I use Pandas (version 2.1.3) and create an empty Series with dtype=int:
import pandas as pd
print(pd.Series([1,2,3], dtype=int).dtype)
It shows 'int32' instead of 'int64'. I expected it to default to int64 in a 64-bit environment. If I do not specify int, like print(pd.Series([1,2,3]).dtype)
it prints'int64'.
Why is Pandas defaulting to int32 instead of int64 in my 64-bit Python environment, and how can I ensure that it defaults to int64?
I do not want to explicitly convert all my DataFrames with .astype("int64"), since that could result in failing tests on other machines.
Upvotes: 1
Views: 892
Reputation: 583
Your test fails because the data cast and expected data have slightly different types, like int32
vs. int64
. Something like assert_frame_equal(df1, df2, check_dtype='equiv')
would be handy but it does not work because pandas
uses the hard check of assert_attr_equal
under the hood.
You don't want to use assert_frame_equal(df1, df2, check_dtype=False)
because it does not check the data type at all, which is bad.
My workaround is to cast columns with equivalent types into the same one in my tests.
import pandas as pd
a = pd.DataFrame({'Int': [1, 2, 3], 'Float': [0.57, 0.179, 0.213]}) # Automatic type casting
# Force 32-bit
b = a.copy()
b['Int'] = b['Int'].astype('int32')
b['Float'] = b['Float'].astype('float32')
# Force 64-bit
c = a.copy()
c['Int'] = c['Int'].astype('int64')
c['Float'] = c['Float'].astype('float64')
try:
pd.testing.assert_frame_equal(b, c)
print('Success')
except AssertionError as err:
print(err)
gives:
Attributes of DataFrame.iloc[:, 0] (column name="Int") are different
Attribute "dtype" are different
[left]: int32
[right]: int64
Workaround function:
def assert_frame_equiv(left: pd.DataFrame, right: pd.DataFrame) -> None:
"""Convert equivalent data types to same before comparing."""
# First, check that the columns are the same.
pd.testing.assert_index_equal(left.columns, right.columns, check_order=False)
# Knowing columns names are the same, cast the same data type if equivalent.
for col_name in left.columns:
lcol = left[col_name]
rcol = right[col_name]
if (
(pd.api.types.is_integer_dtype(lcol) and pd.api.types.is_integer_dtype(rcol))
or (pd.api.types.is_float_dtype(lcol) and pd.api.types.is_float_dtype(rcol))
):
left[col_name] = lcol.astype(rcol.dtype)
return pd.testing.assert_frame_equal(left, right, check_like=True)
try:
assert_frame_equiv(b, c)
print('Success')
except AssertionError as err:
print(err)
Which gives:
Success
I opened a feature request to add check_dtype='equiv'
to pandas
.
Upvotes: 0