gsiener
gsiener

Reputation: 883

How should I (intelligently) store and archive large xml files for a data import

We've got a rails app that processes large amounts of xml data imports. Right now we're storing these ~5MB xml docs in Postgres. This is not ideal given that we use each xml doc once or twice for parsing. We'd like to have an intelligent way of storing and archiving these docs, but not overly complicate the retrieval process for the sake of space. We've considered moving the docs to Mongo (which we're also using), but then aren't we just artificially boosting the memory requirements of our Mongo db servers?

What's the best way for us to deal with this?

Upvotes: 2

Views: 855

Answers (4)

Antonio Cangiano
Antonio Cangiano

Reputation: 733

You may want to look into DB2's PureXML capabilities. To play with it, you can download the free DB2 Express-C version here. For the record, IBM is also the only database provider officially supporting their Ruby driver and Rails adapter, so you wouldn't be on your own.

Upvotes: 4

user533832
user533832

Reputation:

What harm are they doing where they are? They will take up 'space' wherever you put them.

If are confident you will never need them again then there is a case for archival to less expensive storage (eg tape?) - otherwise whatever you do will 'overly complicate the retrieval process'

You could consider compressing them in-place if you are not already doing so

Upvotes: 1

moritz
moritz

Reputation: 5224

I would just store a link to the file in the DB if you use it only for parsing once or twice and then load the file from the given link. Another aproach is to use a XML DB, e.g. eXist.

Upvotes: 5

Tom Micheline
Tom Micheline

Reputation: 911

You could try eXist, an XML database. If you are just archiving them, though, why not just store them in a directory tree?

Upvotes: 5

Related Questions