Reputation: 61
I'm trying to reduce the memory consumption of some large data that we work with, so that more data can be appended to it without throwing memory errors. Downcasting floats where possible helps a little, but the major savings I#ve found have been from casting float64s the Int8 and Int16 where possible. This data contains NaNs. This is unavoidable, and in context there is no value I can replace NaNs with that doesn't change the meaning of the data. The new nullable dtypes are great for this, but I get ValueError: cannot convert float NaN to integer when trying to save the resulting frames to hdf.
I've tried using to_hdf with and without specifiying table format, and get different errors (without specifying table format the error is AttributeError: 'NoneType' object has no attribute 'names')
´´´
df=pd.DataFrame([1,2,3,np.nan,5], columns=['A'])
df.to_hdf('Z:/test.hd5', 'data')
#This works
df['A']=df.A.astype(pd.Int8Dtype())
df.to_hdf('Z:/test.hd5', 'data')
Traceback (most recent call last):
File "<ipython-input-51-6b0f3ad26286>", line 1, in <module>
df.to_hdf('Z:/test.hd5', 'data', complevel=9, complib='blosc:zlib')
File "C:\Users\marnoch.hamilton-jon\AppData\Local\Continuum\anaconda3 \lib\site-packages\pandas\core\generic.py", line 2377, in to_hdf
return pytables.to_hdf(path_or_buf, key, self, **kwargs)
File "C:\Users\marnoch.hamilton-jon\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\pytables.py", line 274, in to_hdf
f(store)
File "C:\Users\marnoch.hamilton-jon\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\pytables.py", line 268, in <lambda>
f = lambda store: store.put(key, value, **kwargs)
File "C:\Users\marnoch.hamilton-jon\AppData\Local\Continuum\anaconda3 \lib\site-packages\pandas\io\pytables.py", line 889, in put
self._write_to_group(key, value, append=append, **kwargs)
File "C:\Users\marnoch.hamilton-jon\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\pytables.py", line 1415, in _write_to_group
s.write(obj=value, append=append, complib=complib, **kwargs)
File "C:\Users\marnoch.hamilton-jon\AppData\Local\Continuum\anaconda3 \lib\site-packages\pandas\io\pytables.py", line 3022, in write
blk.values, items=blk_items)
File "C:\Users\marnoch.hamilton-jon\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\pytables.py", line 2750, in write_array
atom = _tables().Atom.from_dtype(value.dtype)
File "C:\Users\marnoch.hamilton-jon\AppData\Local\Continuum\anaconda3\lib\site-packages\tables\atom.py", line 381, in from_dtype
if basedtype.names:
AttributeError: 'NoneType' object has no attribute 'names'
´´´
Is this a bug? An intentional limitation? Or have I done something dumb?
Upvotes: 2
Views: 788