The best way to store millions of 100kb/Ave. records

Question

We are in a situation that we need to store millions of records everyday,

Data Structure Model:

id
date
title
...
Data [RAW TEXT]

Our [RAW TEXT] is different each time, from ~30KB to 300KB, and on average it's 100Kbs. We never need to search [RAW TEXT], also maybe once a month data access is required to some of them by id.

Now we are storing all of them(attributes and data) in MongoDb because of the great INSERT speed and performance in MongoDb. But our database size is growing rapidly and it's about 85GBs now, and in next few days it will be a problem for us.

Here is the question, how would you implement it?
Does it really worth to change Database and Software Structures to store data[RAW TEXT] in File System(/datafiles/x/y/z/id.txt)?
Will this change have a significant impact on system performance?

paddy · Accepted Answer

If you're concerned about storage, why not compress the text data? Decent text compression should be about 10:1.

Personally, I'd take the file-based approach, because it sounds like your main function is archiving. I'd write all the info into the file that's needed to regenerate the database record, compress it, and store it in some kind of sensible directory structure based on the key. The reason being that it's easy to start a new disk or move sections of the data off to archival storage.

If you are collecting 10 million records each day with compression, that amounts to about 100GB per day. You might want to make a 'Disk ID' to form part of the key, as at this rate you'd fill up a 2TB disk in about 3 weeks. Even a 20TB RAID array would fill up in about 6 months.

The best way to store millions of 100kb/Ave. records

Answers (1)

Related Questions