Pandas to_hdf fails on dataframes containing nullable int dtypes (e.g. Int8Dtype)

Question

I'm trying to reduce the memory consumption of some large data that we work with, so that more data can be appended to it without throwing memory errors. Downcasting floats where possible helps a little, but the major savings I#ve found have been from casting float64s the Int8 and Int16 where possible. This data contains NaNs. This is unavoidable, and in context there is no value I can replace NaNs with that doesn't change the meaning of the data. The new nullable dtypes are great for this, but I get ValueError: cannot convert float NaN to integer when trying to save the resulting frames to hdf.

I've tried using to_hdf with and without specifiying table format, and get different errors (without specifying table format the error is AttributeError: 'NoneType' object has no attribute 'names')

´´´
df=pd.DataFrame([1,2,3,np.nan,5], columns=['A'])
df.to_hdf('Z:/test.hd5', 'data')
#This works

df['A']=df.A.astype(pd.Int8Dtype())
df.to_hdf('Z:/test.hd5', 'data')

Traceback (most recent call last):

  File "", line 1, in 
    df.to_hdf('Z:/test.hd5', 'data', complevel=9, complib='blosc:zlib')

  File "C:\Users\marnoch.hamilton-jon\AppData\Local\Continuum\anaconda3       \lib\site-packages\pandas\core\generic.py", line 2377, in to_hdf
    return pytables.to_hdf(path_or_buf, key, self, **kwargs)

  File "C:\Users\marnoch.hamilton-jon\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\pytables.py", line 274, in to_hdf
    f(store)

  File "C:\Users\marnoch.hamilton-jon\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\pytables.py", line 268, in 
    f = lambda store: store.put(key, value, **kwargs)

  File "C:\Users\marnoch.hamilton-jon\AppData\Local\Continuum\anaconda3 \lib\site-packages\pandas\io\pytables.py", line 889, in put
    self._write_to_group(key, value, append=append, **kwargs)

  File "C:\Users\marnoch.hamilton-jon\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\pytables.py", line 1415, in _write_to_group
    s.write(obj=value, append=append, complib=complib, **kwargs)

  File "C:\Users\marnoch.hamilton-jon\AppData\Local\Continuum\anaconda3 \lib\site-packages\pandas\io\pytables.py", line 3022, in write
    blk.values, items=blk_items)

  File "C:\Users\marnoch.hamilton-jon\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\pytables.py", line 2750, in write_array
    atom = _tables().Atom.from_dtype(value.dtype)

  File "C:\Users\marnoch.hamilton-jon\AppData\Local\Continuum\anaconda3\lib\site-packages	ables\atom.py", line 381, in from_dtype
    if basedtype.names:

 AttributeError: 'NoneType' object has no attribute 'names'
´´´

Is this a bug? An intentional limitation? Or have I done something dumb?

Pandas to_hdf fails on dataframes containing nullable int dtypes (e.g. Int8Dtype)

Answers (1)

Related Questions