Reputation: 241
I want to read and write data to hdf5 file incrementally because I can't fit the data into memory.
The data to read/write is sets of integers. I only need to read/write the sets sequentially. No need for random access. Like I read set1, then set2, then set3, etc.
The problem is that I can't retrieve the sets by index.
import pandas as pd
x = pd.HDFStore('test.hf', 'w', append=True)
a = pd.Series([1])
x.append('dframe', a, index=True)
b = pd.Series([10,2])
x.append('dframe', b, index=True)
x.close()
x = pd.HDFStore('test.hf', 'r')
print(x['dframe'])
y=x.select('dframe',start=0,stop=1)
print("selected:", y)
x.close()
Output:
0 1
0 10
1 2
dtype: int64
selected: 0 1
dtype: int64
It doesn't select my 0th set, which is {1,10}
Upvotes: 1
Views: 314
Reputation: 241
This way works. But I really don't know how fast is this.
And does this scan the whole file to find rows with the index?
That would be quite a waste of time.
import pandas as pd
x = pd.HDFStore('test.hf', 'w', append=True, format="table", complevel=9)
a = pd.Series([1])
x.append('dframe', a, index=True)
b = pd.Series([10,2])
x.append('dframe', b, index=True)
x.close()
x = pd.HDFStore('test.hf', 'r')
print(x['dframe'])
y=x.select('dframe','index == 0')
print('selected:')
for i in y:
print(i)
x.close()
Output:
0 1
0 10
1 2
dtype: int64
selected:
1
10
Upvotes: 1