Stephen
Stephen

Reputation: 373

Storing XML Data in Thousands of Small Files

Is it better to store data in thousands of separate files or in a few XML files?

The data is shared between multiple devices that regularly update individual pieces of it. To minimize conflict, each object would be stored in a single file named after the GUID.

For instance, there might be 1000 projects stored in 1000 XML files and 500 categories stored in another 500 files.

Applications like OmniFocus and 1Password currently use a variant of this approach. OmniFocus zipped some of the files but still suffered from performance issues on webDav drives.

Users would typically have in the realm of thousands of files, with some having tens of thousands.

In my particular case, the data is stored on a service like Dropbox thus a central database solution is not available.

Devices modifying the data include iOS, Android, Mac's and PCs.

The files don't necessarily need to be XML. Just seems to be a convenient way to store data.

I'm worried about performance and other issues with this many files. I already have a working solution using a dozen files (broken into master and periodic change files) but there are many edge case scenarios in this other solution and I wonder if one file per UID might be cleaner.

Thoughts?

Upvotes: 0

Views: 269

Answers (2)

Mark O'Connor
Mark O'Connor

Reputation: 77981

Normally I'd recommend loading your data into some sort of database. This makes it simpler to search, manipulate and extract in other formats.

Having said that I once had to design an application that depended on thousands of CSV files, totalling several millions of lines of data. One of the design goals was the keep the data mastered in it's original format, so to assist with searching each file was loaded into a Solr index.

If you haven't seen Solr in action I highly recommend it. Once you data is indexed, it provides a JSON based REST API for searching your content. The indexes are simple to keep up-to-date and very fast.

Upvotes: 0

Michael Kay
Michael Kay

Reputation: 163418

Sorry, but the answer is: it depends. Some things will be easier/faster with a few large files, some things will be easier/faster with a lot of small files.

Big files tend to mean that you will often be parsing data you don't need, and will be allocating memory to hold data that you don't need.

Small files mean that you need some way of keeping track of all your files and finding the right ones for a given operation.

I wouldn't want to handle more than 1000 files or so without an XML database.

Upvotes: 2

Related Questions