theta
theta

Reputation: 25601

How to store dictionary in HDF5 dataset

I have a dictionary, where key is datetime object and value is tuple of integers:

>>> d.items()[0]
(datetime.datetime(2012, 4, 5, 23, 30), (14, 1014, 6, 3, 0))

I want to store it in HDF5 dataset, but if I try to just dump the dictionary h5py raises error:

TypeError: Object dtype dtype('object') has no native HDF5 equivalent

What would be "the best" way to transform this dictionary so that I can store it in HDF5 dataset?

Specifically I don't want to just dump the dictionary in numpy array, as it would complicate data retrieval based on datetime query.

Upvotes: 37

Views: 71658

Answers (6)

Vaibhav Dixit
Vaibhav Dixit

Reputation: 894

Previous answers were aiming to store a Python dictionary as hdf5 dataset. The following code can be used for storing Python dictionary as hdf5 attributes(metadata) which is more logical method:

import h5py
import numpy as np

#Writing data
d1 = np.random.random(size=(1000, 20))  # Sample data
hf = h5py.File("test_data.h5", "w")
dset1 = hf.create_dataset("dataset_1", data=d1)
#set some metadata directly
hf.attrs["metadata1"] = 5

#sample dictionary object
sample_dict = {
    "metadata2": 1, "metadata3": 2, 
    "metadata4": "blah_blah"
}

#Store this dictionary object as hdf5 metadata
hf.attrs.update(sample_dict)
hf.close()

#Reading data
hf1 = h5py.File("test_data.h5", "r")
for name in hf1:
    print(name)

print(hf1.attrs.keys())
hf1.close()

This gives an output as

dataset_1
<KeysViewHDF5 ['metadata1', 'metadata2', 'metadata3', 'metadata4']>

It means that metadata1 which was directly assigned as an attribute and metadata2, 3, 4 which are obtained from a dictionary object, are simultaneously stored as attributes.

Upvotes: 10

YScharf
YScharf

Reputation: 2027

Another option would be to use the hdf5 group feature. h5py documentation on groups

Sample code:

Save dictionary to h5:

dict_test = {'a': np.ones((100,100)), 'b': np.zeros((100,100))}
hf = h5py.File('dict_data.h5', 'w')
dict_group = hf.create_group('dict_data')
for k, v in dict_test.items():
    dict_group[k] = v
hf.close()

Then to load the data back into a dictionary:

dict_new = {}
file = h5py.File('dict_data.h5', 'r')
dict_group_load = file['dict_data']
dict_group_keys = dict_group_load.keys()
for k in dict_group_keys:
    dict_new[k]= dict_group_load[k][:]

Upvotes: 3

Ameet Deshpande
Ameet Deshpande

Reputation: 536

This question relates to the more general question of being able to store any type of dictionary in HDF5 format. First, convert the dictionary to a string. Then to recover the dictionary, use the ast library by using the import ast command. The following code gives an example.

>>> d = {1:"a",2:"b"}
>>> s = str(d)
>>> s
"{1: 'a', 2: 'b'}"
>>> ast.literal_eval(s)
{1: 'a', 2: 'b'}
>>> type(ast.literal_eval(s))
<type 'dict'>

Upvotes: 16

wordsforthewise
wordsforthewise

Reputation: 15787

Nowadays we have deepdish (www.deepdish.io):

import deepdish as dd
dd.io.save(filename, {'dict1': dict1, 'dict2': dict2}, compression=('blosc', 9))

Upvotes: 4

Jason S
Jason S

Reputation: 189686

I would serialize the object into JSON or YAML and store the resulting string as an attribute in the appropriate object (HDF5 group or dataset).

I'm not sure why you're using the datetime as a dataset name, however, unless you absolutely need to look up your dataset directly by datetime.

p.s. For what it's worth, PyTables is a lot easier to use than the low-level h5py.

Upvotes: 6

theta
theta

Reputation: 25601

I found two ways to this:

I) transform datetime object to string and use it as dataset name

h = h5py.File('myfile.hdf5')
for k, v in d.items():
    h.create_dataset(k.strftime('%Y-%m-%dT%H:%M:%SZ'), data=np.array(v, dtype=np.int8))

where data can be accessed by quering key strings (datasets name). For example:

for ds in h.keys():
    if '2012-04' in ds:
        print(h[ds].value)

II) transform datetime object to dataset subgroups

h = h5py.File('myfile.hdf5')
for k, v in d.items():
    h.create_dataset(k.strftime('%Y/%m/%d/%H:%M'), data=np.array(v, dtype=np.int8))

notice forward slashes in strftime string, which will create appropriate subgroups in HDF file. Data can be accessed directly like h['2012']['04']['05']['23:30'].value, or by iterating with provided h5py iterators, or even by using custom functions through visititems()

For simplicity I choose the first option.

Upvotes: 18

Related Questions