Pandas HDFStore for out-of-core Sequential read/write of sets with variable sizes

Question

I want to read and write data to hdf5 file incrementally because I can't fit the data into memory.

The data to read/write is sets of integers. I only need to read/write the sets sequentially. No need for random access. Like I read set1, then set2, then set3, etc.

The problem is that I can't retrieve the sets by index.

import pandas as pd    
x = pd.HDFStore('test.hf', 'w', append=True)
a = pd.Series([1])
x.append('dframe', a, index=True)
b = pd.Series([10,2])
x.append('dframe', b, index=True)
x.close()

x = pd.HDFStore('test.hf', 'r')
print(x['dframe'])
y=x.select('dframe',start=0,stop=1)
print("selected:", y)
x.close()

Output:

0     1
0    10
1     2
dtype: int64
selected: 0    1
dtype: int64

It doesn't select my 0th set, which is {1,10}

Pandas HDFStore for out-of-core Sequential read/write of sets with variable sizes

Answers (1)

Related Questions