Reputation: 1
I have thousands of HDF files which is around 30GB in total. The HDF files are created though Vaex, but the name and number of columns of each file are not the same. I want combine them to a single HDF file and the same dataframe, is there any solution? Thanks.
Upvotes: 0
Views: 160
Reputation: 813
If your files are on disk, you can do something like
df = vaex.open_many(list_of_filepaths)
Alternatively if your file names are named according to some pattern, for example: part1.hdf5
, part2.hdf5
, etc.. you can do something:
# Supports glob expressions
df = vaex.open(part*.hdf5)
After that you can use df.export(...)
to export to a single HDF5 file.
Upvotes: 0