Most suitable data store for billions of indexes

Question

So we're looking to store two kinds of indexes.

First kind will be in the order of billions, each with between 1 and 1000 values, each value being one or two 64 bit integers.
Second kind will be in the order of millions, each with about 200 values, each value between 1KB and 1MB in size.

And our usage pattern will be something like this:

Both kinds of index will have values added to the top up to thousands of times per second.
Indexes will be infrequently read, but when they are read it'll be the entirety of the index that is read
Indexes should be pruned, either on writing values to the index or in some kind of batch type job

Now we've considered quite a few databases, our favourites at the moment are Cassandra and PostreSQL. However, our application is in Erlang, which has no production-ready bindings for Cassandra. And a major requirement is that it can't require too much manpower to maintain. I get the feeling that Cassandra's going to throw up unexpected scaling issues, whereas PostgreSQL's just going to be a pain to shard, but at least for us it's a know quantity. We're already familiar with PostgreSQL, but not hugely well acquainted with Cassandra.

So. Any suggestions or recommendations as to which data store would be most appropriate to our use case? I'm open to any and all suggestions!

Thanks,

-Alec

DNA · Accepted Answer

You haven't given enough information to support much of an answer re: your index design. However, Cassandra scales up quite easily by growing the cluster.

You might want to read this article: http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html

A more significant issue for Cassandra is whether it supports the kind of queries you need - scalability won't be the problem. From the numbers you give, it sounds like we are talking about terabytes or tens of terabytes, which is very safe territory for Cassandra.

Most suitable data store for billions of indexes

Answers (2)

Related Questions