Weight decay
Weight decay

Reputation: 21

What is the standard way to store a list of variable length string in hdf5?

Assumed that I want thing to be automatic and default, I can do something like this.

with h5py.File('store_str_2.hdf5','w') as hf: 
     variable_length_str = ['abcd', 'bce', 'cd']
     hf.create_dataset('variable_length_str', data=variable_length_str) 

But on the Internet, I can find solution like:

with h5py.File('store_str.hdf5','w') as hf: 
     dt = h5py.special_dtype(vlen=str) 
     variable_length_str = np.array(['abcd', 'bce', 'cd'], dtype=dt) 
     hf.create_dataset('variable_length_str', data=variable_length_str)

So what is the difference between the two? Why don't just use the simple one to store list of variable-length strings? Will it cause some consequences like it will take more spaces, etc?

Another question is if I want to save space(with compression), what will be the better way to store list of strings in hdf5?

Upvotes: 1

Views: 968

Answers (1)

kcw78
kcw78

Reputation: 8006

Q1: What is the difference between the two?

h5py is designed to use NumPy arrays to hold HDF5 data. So, typical behavior is fixed length strings (S10 for example). The dtype you found is the older h5py implementation to support variable length strings. The current implementation uses h5py.string_dtype(encoding= , length=), with length=None for variable-length strings. Note: The same limitation applies to variable length (aka 'ragged') arrays with the associated.

Q2: Why don't just use the simple one to store list of variable-length strings?
Q3: Will it cause some consequences like it will take more spaces, etc?

You can use the simple string dtype, but all saved strings will be the same length. You will have to allocate to save the longest string you want to save -- shorter strings will be padded with spaces.

For details, see h5py documentation here: h5py: Variable-length strings

Note, the API was updated in h5py 2.10. The older API is documented here: h5py: Older vlength API

Upvotes: 1

Related Questions