TLD
TLD

Reputation: 8135

Cassandra's performance: data size and hardware

I need a high performance database for multiple concurrent read/write operations on a large data table and I don't know if Cassandra is a good candidate or not. Thus it would be great if you can help me to clarify my below questions. Let's say I have a table with 5 million of rows and 5 millions of columns.

1.Is cassandra's performance linear to the processing power of hardware?

2.When I need to search for 1 column to see if it is existed or not, if not, then I want to insert a new one to the current table. Is this operation fast?

3.If the current response time of read/write operations is slow, what are the possible ways I can improve it without changing the structure of my current table?

Additional information:
a. Transaction control is not important.
b. Replication is depends on use cases. For the table that have multiple concurrent read/write operations, replication is not needed. For the table that have multiple concurrent read, replication is needed.

Thank you very much.

Upvotes: 1

Views: 606

Answers (1)

doanduyhai
doanduyhai

Reputation: 8812

1.Is cassandra's performance linear to the processing power of hardware?

Cassandra overall performance is rather linear to the number of machines. For 1 machine, if you're using spinning disk, officially it is not recommended to exceed 1Tb/machine. The limit for SSD is higher, around 3Tb/machine. At least that's what recommended for Cassandra 2.1 and 2.2. With Cassandra 3.0 and the storage engine rewrite, those figures may be higher because server density has been improved.

2.When I need to search for 1 column to see if it is existed or not, if not, then I want to insert a new one to the current table. Is this operation fast?

Lookup of data using primary key is quite fast thanks to a lot of data structure to optimize disk access (bloom filter, partition key cache, partition sample ... see http://www.slideshare.net/doanduyhai/cassandra-introduction-apache-con-2014-budapest/48)

If you're not accessing data by primary key, it will result in sequential scan on a lot of data and then performance is no longer guaranteed

3.If the current response time of read/write operations is slow, what are the possible ways I can improve it without changing the structure of my current table?

It should be the other way around. Design your table structure and data model to have fast read (write operations are always fast with Cassandra). Appropriate hardware (SSD) and memory (for page cache) will also improve the read/write operations. Apart from those parameters above, the other tuning knobs (key cache size, bloom filter fp chance ...) only give marginal improvement

b. Replication is depends on use cases. For the table that have multiple concurrent read/write operations, replication is not needed.

Without replication, data lost is possible with hardware failure, are you sure data lost is acceptable for a table that should serve read & write ?

Upvotes: 5

Related Questions