Reputation: 16478
I have several different data frames that are related (and there is ids to join them if needed). However, I don't always need them at the same time.
Since they are quite large, does it make sense to store them in separate HDF stores? Or is the cost of carrying around the "unused" frames negligible when I'm working on the other frames in the same file?
Upvotes: 3
Views: 98
Reputation: 210822
Theoretically if you can separate your HDF files in terms of IO subsystem (different spindles, different storage systems, etc.), you can try to read your DFs in parallel, practically i would test it in your particular case on your hardware with your data, etc.
Another advantage of separating files - if you remove or dramatically decrease the size of a huge DF from/in the HDF Store containing multiple DFs - it's size will remain unchanged. If you have a separate file, you can simply drop it and free unused space
Upvotes: 1
Reputation: 294218
The cost of carrying unused frames is the same if they are in another file or the same file. Ask your self if its better to store this sql table in another database or the same database. If they are related, keep them in the same store.
Upvotes: 0