mirekphd
mirekphd

Reputation: 6821

Is Cassandra / ScyllaDB capable of handling millions of very wide data rows?

A new business need has emerged in our firm, where a relatively "big" data set needs to be accessed by online processes (with typical latency of up to 1 second). There is only one key with a high granularity / rows count measured in tens of millions and the expected number of columns / fields / value columns will likely exceed hundreds of thousands.

The key column is shared among all value columns, so key-value storage, while scalable, seems rather wasteful here. Is there any hope for using Cassandra / ScyllaDB (to which we gradually narrowed down our search) for such a wide data set, while ideally reducing also data storage needs by half (by storing the common key only once)?

Upvotes: 0

Views: 749

Answers (1)

Nadav Har'El
Nadav Har'El

Reputation: 13771

If I understand your use case correctly, your use case will have tens of millions of partitions (what you called rows), and each will have hundreds of thousands of different values in each of them (each those would be a clustering row in modern CQL - CQL no longer supports un-schema-ed wide rows). This is a fairly reasonable data set for Scylla and Cassandra.

But I want to add that I'm not sure the storage saving you are hoping for will really be there. Yes, Scylla/Cassandra will not need to store the partition key multiple times, but unless the partition key is very long, this will be often be negligible compared to the other overheads of storing the data on disk.

Another thing you should consider is your expected queries. How will you read from this database? If you'll want to read all 100,000 columns of a particular key, or a contiguous range of them, then the data model you described is perfect. However, if the expected use case is that you always plan to read a single column from a specific key, then this data model will be inefficient - a random-access read from the middle of a long partition is slower than reading the value from a short partition.

Upvotes: 1

Related Questions