Vinod
Vinod

Reputation: 949

Choosing NoSQL Database for Application with heavy DateRange Reports

I have an application in which features of the application used by any user gets continuously logged into a text file and then later (off peak hours) a scheduled application inserts these log records into MS SQL Server based RDBMS. As you can guess, this generate quite a lot of data (since single user session would generates many records and there are hundreds of users using the application concurrently all day long). The database currently has millions of records and is mainly used to generate date range based reports which gives an aggregate of which feature was used how many times for a given date range which is where the database is giving performance issues.

Now I am aware that straight forward solution to this would be to fix the Database design (if at all its possible!) to improve performance. But I am not supposed to suggest any SQL design changes. I am supposed to evaluate NoSQL databases with a data set similar to the one I explained above.

I start looking into NoSQL databases and I am bombarded with different types of NoSQL databases. With my initial study, I am pretty sure I can rule out Graph Oriented databases. Key-Value stores also seem inappropriate for my purpose as it will mostly allow me to query with only key as per whatever I understood from this paper on DynamoDB. Although the requirement is heavy queries on Date Range, option of querying by features is desirable.

That leaves me with Document Oriented databases and column family stores

So based on all these points, which database would you suggest for my case?

PS: Please do not suggest any cloud based solutions as I am not supposed to migrate data out on cloud!!!

Upvotes: 0

Views: 535

Answers (2)

ideawu
ideawu

Reputation: 2337

I would suggest Redis and SSDB, the built-in data type zset is the ideal data type to store time based data. One the sequence is stored(insert and delete at any time), you can find by time stamp.

Here is some code snipets:

$ssdb->zset('z', 'a', 1384423054);
$ssdb->zset('z', 'b', 1384423055);
$ssdb->zset('z', 'c', 1384423056);
$items = $ssdb->zrange('z', 1384423054, 1384423057, $limit=100);

The difference between these two databases is that Redis is memory only(disk for backup), but SSDB is disk based and uses memory as cache.

Upvotes: 0

Didier Spezia
Didier Spezia

Reputation: 73226

Anything supporting efficient range scans or partition scans will work fine for your use case.

This includes MongoDB, Cassandra, HBase, any RDBMS (especially the ones supporting partitioning). I would exclude CouchDB here, since I don't think Javascript map/reduce can compete with NoSQL/RDBMS engines implemented in C/C++/Java.

Column family stores are commonly used to work with time series. For instance, have a look at the OpenTSDB project, based on HBase.

Finally, you may also consider to just use a search engine instead of a NoSQL store or a RDBMS. Increasingly, search engines (such as Lucene) are used for analytic workloads. For instance, have a look at the Elasticsearch, Logstash, Kibana triplet.

Upvotes: 1

Related Questions