Reputation: 69
I want to convert the dtypes of my df by convert_dtypes but if I then want to store it via HDFStore I get this: AttributeError: 'IntegerArray' object has no attribute 'size'
df = pd.DataFrame()
df["test"] = [0,1,2,3]
df["test1"] = [0,1,2,3.5]
df = dfdf.convert_dtypes()
store=pd.HDFStore(r"C:\Users\User\Desktop\test.h5")
store["test"] = df
store.close()
Upvotes: 2
Views: 943
Reputation: 465
I experienced the same issue. IntegerArrays have the property that NaNs can be represented (similar as e.g. float64), which is not possible with normal numpy int datatypes in pandas. However, this causes this dtype to fail when writing to HDF. See here (https://github.com/pandas-dev/pandas/issues/26144). If you don't have any NaNs in your columns, the following is a simple and quick solution:
cols = df.columns
for col in cols:
col_dtype = df[col].dtype
try:
if col_dtype == pd.Int8Dtype():
df[col] = df[col].astype('int8')
elif col_dtype == pd.Int16Dtype():
df[col] = df[col].astype('int16')
elif col_dtype == pd.Int32Dtype():
df[col] = df[col].astype('int32')
elif col_dtype == pd.Int64Dtype():
df[col] = df[col].astype('int64')
except:
pass
Upvotes: 2