Money Parashar
Money Parashar

Reputation: 56

How to handle a table with billion of rows with lots of read and write operations

Please guide me through my problem

I receive data at every 1 sec at my server from different sources.My data is structured i parse it and now i have to store this parsed data into single table around 5 lacs of records in a day. Also daily i do lots of read operation on this table.After some time this table will have billions of record.

How should i solve this problem? I want to know should i go with RDBMS or HBase or any other option.

Upvotes: 0

Views: 1070

Answers (4)

Samir
Samir

Reputation: 11

If you writes are at 1/second, most of the available databases should be able to support this. Since you are looking for longer term/persistent store, you should consider a database that provides you horizontal scale so that you could add more nodes as and when you would like to increase the capacity. Databases with auto-sharding abilities would be great fit for you (cassandra, aerospike ...). Make sure you choose a auto-sharding database that doesn't require client/application to manage which data is stored where. In-memory databases would not fit the bill in this case.

When your storage is a few tera-bytes, you may have to worry about the database scale, throughput so that your infra cost doesn't bogg you down.

Your query patterns would be very crucial in choosing the right solution. You may not want to index everything, but fine-tune what you index so that you could query on the keys and/or only those data elements from within your records so that index storage overhead doesn't become too much, and hence you keep the cost under control. You should also look for time-range query ability for the database solutions, which seems to be part of your typical query pattern.

Last but not the least, you would want to have your queries processes in fastest possible time. You should try out Cassandra (good for horizontal scaling, less on the throughput) and aerospike (good for horizontal scaling, pretty good on throughput).

Upvotes: 0

Peter Corless
Peter Corless

Reputation: 432

My question is regarding what sort of database repository you wish to use: RAM? Flash? Disk?

RAM responds in nanoseconds. Flash in microseconds. Disk in milliseconds.

And, of course, you might want to create a hybrid of all three, especially if some keys were "hotter" than others -- more likely to be read over and over.

If you want to do a lot of fast processing, and scale it "wide" (many CPUs in a cluster for faster read performance), you are a likely candidate for a NoSQL database. I'd need to know more about your data model to know whether it would work as a key-value store, and how it might require more internal structure such as JSON/BSON.

Caveat: I am biased towards Aerospike, my employer. Yet you should do some kicking-of-the-tires with us or any other key-value stores you're considering to see if it would work with your data before betting the farm. Obviously, each NoSQL vendor would claim itself to be "the best," but much depends on your use case. A vendor's "solution" will only work well for certain data models. We tend to be best for fast in-memory RAM/Flash or hybrid implementations.

Upvotes: 1

Abhijeet Dhumal
Abhijeet Dhumal

Reputation: 1809

You can use HBase as a NoSQL database in this case. To make search more customized and faster use ElasticSearch along with Hbase.

Upvotes: 0

Amar
Amar

Reputation: 3845

If in case your table would reach billions of records, RDBMS definitely won't scale.

Regarding HBASE, it depends on your requirements whether it would be a good solution or not. If you are looking for real time reads, Hbase would only help if you are only looking for a specific key. If you want to do random reads on different columns, Hbase won't be an ideal solution here. Hbase would scale really well in case of updates.

I would suggest you to design your Hbase schema efficiently and store your data in way which suits your querying.

However if you are interested in running aggregation queries you can also map your hbase table to an external table in Hive and run sql type queries on your data.

Upvotes: 0

Related Questions