Reputation: 64719
Once you create an h5py dataset, how do you add or remove specific rows or columns from an NxM array?
My question is similar to this one, but I don't want to blindly truncate or expand the array. When removing, I need to be able to specify the exact row or column to remove.
For adding, I know I have to specify maxshape=(None, None)
when creating the initial dataset, but the resize
method doesn't seem to let you specify which rows or columns get truncated if you shrink the size.
Upvotes: 6
Views: 4228
Reputation: 5471
h5py isn't really designed for doing this. Pandas might be a better library to use, as it's built around the concept of tables.
Having said that, here's how to do it:
In [1]: f = h5py.File('test.h5')
In [2]: arr = rand(4,4)
In [3]: dset = f.create_dataset('foo',data=arr,maxshape=(2000,2000))
In [4]: dset[:]
Out[4]:
array([[ 0.29732874, 0.59310285, 0.61116263, 0.79950116],
[ 0.4194363 , 0.4691813 , 0.95648712, 0.56120731],
[ 0.76868585, 0.07556214, 0.39854704, 0.73415885],
[ 0.0919063 , 0.0420656 , 0.35082375, 0.62565894]])
In [5]: dset[1:-1,:] = dset[2:,:]
In [6]: dset.resize((3,4))
In [7]: dset[:]
Out[7]:
array([[ 0.29732874, 0.59310285, 0.61116263, 0.79950116],
[ 0.76868585, 0.07556214, 0.39854704, 0.73415885],
[ 0.0919063 , 0.0420656 , 0.35082375, 0.62565894]])
This removes column 1 from dset
. It does so by assigning columns 2 and 3 to 1 and 2, respectively, before shrinking the dataset by one column. Swap the subscripts to remove row 1. You can easily write a wrapper around this if you're going to be doing it a lot.
Upvotes: 7