Optimize chunkshape parameter of pytables/HDF5 for reading entire column

Question

I'm trying to improve performance of my pytables/HDF5 code by specifying the chunkshape when creating a table. I can't figure out what the real dimensions or format of the chunkshape parameter. I can see from the code that I it ultimately ends up as a tuple with a single element.

Is this single element supposed to be the number of rows, bytes, or what?

My specific issue is I have existing code that creates an HDF5 table with 20 columns. I would like to change the chunks of the table so that each column is stored contiguously on disk. Thus, optimizing for reading entire columns out at a single time.

I tried just setting the chunkshape to 20 (number of columns), but this dramatically decreased the performance of reading an entire column. Should the chunk shape be set to the width (in bytes) of a single row?

I would just like to know what the chunkshape should be if:

I want to read an entire column as fast as possible.
I know exactly how many columns are in the table.
I cannot just simply change the table to have the existing rows as columns and vice-versa for backwards-compatibility reasons.

Optimize chunkshape parameter of pytables/HDF5 for reading entire column

Answers (1)

Related Questions