Cetarius
Cetarius

Reputation: 69

pandas.DataFrame.convert_dtypes with HDFStore leads to Attribute Error?

I want to convert the dtypes of my df by convert_dtypes but if I then want to store it via HDFStore I get this: AttributeError: 'IntegerArray' object has no attribute 'size'

df = pd.DataFrame()
df["test"] = [0,1,2,3]
df["test1"] = [0,1,2,3.5]
df = dfdf.convert_dtypes()
store=pd.HDFStore(r"C:\Users\User\Desktop\test.h5")
store["test"] = df
store.close()

Upvotes: 2

Views: 943

Answers (1)

Olivier
Olivier

Reputation: 465

I experienced the same issue. IntegerArrays have the property that NaNs can be represented (similar as e.g. float64), which is not possible with normal numpy int datatypes in pandas. However, this causes this dtype to fail when writing to HDF. See here (https://github.com/pandas-dev/pandas/issues/26144). If you don't have any NaNs in your columns, the following is a simple and quick solution:

cols = df.columns
for col in cols:
    col_dtype = df[col].dtype 
    try:               
        if col_dtype == pd.Int8Dtype():
            df[col] = df[col].astype('int8')
        elif col_dtype == pd.Int16Dtype():
            df[col] = df[col].astype('int16')
        elif col_dtype == pd.Int32Dtype():
            df[col] = df[col].astype('int32')    
        elif col_dtype == pd.Int64Dtype():
            df[col] = df[col].astype('int64')
    except:
        pass


Upvotes: 2

Related Questions