Reputation: 529
I am facing the same problem as the one raised in How to trouble-shoot HDFStore Exception: cannot find the correct atom type.
I reduce it to an example given in the pandas' documentation Storing Mixed Types in a Table.
The whole point in this example is to append
a DataFrame
with some missing values to a HDFStore
. When I use the example code I end up with an atom type error
.
df_mixed
Out[103]:
A B C bool datetime64 int string
0 -0.065617 -0.062644 -0.004758 True 2001-01-02 00:00:00 1 string
1 1.444643 1.664311 -0.189095 True 2001-01-02 00:00:00 1 string
2 0.569412 -0.077504 -0.125590 True 2001-01-02 00:00:00 1 string
3 NaN NaN 0.563939 True NaN 1 NaN
4 NaN NaN -0.618218 True NaN 1 NaN
5 NaN NaN 1.477307 True NaN 1 NaN
6 -0.287331 0.984108 -0.514628 True 2001-01-02 00:00:00 1 string
7 -0.244192 0.239775 0.861359 True 2001-01-02 00:00:00 1 string
store=HDFStore('df.h5')
store.append('df_mixed', df_mixed, min_itemsize={'values':50})
...
Exception: cannot find the correct atom type -> [dtype->object,items->Index([datetime64, string], dtype=object)] object of type 'Timestamp' has no len()
If I enforce dtype
for the problematic types (actually the object
ones) as suggested in the linked post (Jeff's answer), I still get the same error. What am I missing here?
dtypes = [('datetime64', '|S20'), ('string', '|S20')]
store=HDFStore('df.h5')
store.append('df_mixed', df_mixed, dtype=dtypes, min_itemsize={'values':50})
...
Exception: cannot find the correct atom type -> [dtype->object,items->Index([datetime64, string], dtype=object)] object of type 'Timestamp' has no len()
Thanks for insights
SOLVED
I was using pandas
0.10 and switched to 0.11-dev . As Jeff inferred, the trouble was with NaN vs NaT.
The former pandas version produced
df_mixed.ix[3:5,['A', 'B', 'string', 'datetime64']] = np.nan such that
2 0.569412 -0.077504 -0.125590 True 2001-01-02 00:00:00 1 string
3 NaN NaN 0.563939 True NaN 1 NaN
while the latter version
2 0.569412 -0.077504 -0.125590 True 2001-01-02 00:00:00 1 string
3 NaN NaN 0.563939 True NaT 1 NaN
Upvotes: 1
Views: 768
Reputation: 129058
The problem are the NaN in your datetime64[ns] series. These MUST be NaT. How did you construct this frame? What pandas version are you using?
Can you use 0.11-dev? (there are several more options here). Try this:
df['datetime64'] = Series(df['datetime64'],dtype='M8[n2]')
In addition, here are some more useful links: http://pandas.pydata.org/pandas-docs/dev/cookbook.html#hdfstore
Upvotes: 2