Ginger
Ginger

Reputation: 8660

Multiple files or single files into HDFStore

I am converting 100 csv files into dataframes and storing them in an HDFStore.

What are the pros and cons of

a - storing the csv file as 100 different HDFStore files?

b - storing all the csv files as separate items in a single HDFStore?

Other than performance issues, I am asking the question as I am having stability issues and my HDFStore files often get corrupted. So, for me, there is a risk associated with a single HDFStore. However, I am wondering if there are benefits to having a single store.

Upvotes: 0

Views: 243

Answers (1)

Jeff
Jeff

Reputation: 129068

These are the differences:

multiple files

  1. when using multiple files you can only corrupt a single file when writing (eg you have a power failure when writing)
  2. you can parallelize writing with multiple files (note - never, ever try to parallelize with a single file a this will corrupt it!!!)

single file

  1. grouping if logical sets

IMHO the advantages of multiple files outweigh using a single file as you can easily replicate the grouping properties by using sub directories

Upvotes: 1

Related Questions