Pavlo
Pavlo

Reputation: 1654

Cassandra as replacement to PostgreSQL

Is Cassandra with multiple nodes a good choice as replacement to single node PostgreSql? Data being stored is a time series. It is about tens of gigabytes already and is expected to grow. Database should be integrated into pipeline with apache spark as source and possibly result destination. What is needed:
1) redundancy: one node failure shouldn't stop the system (all data should be available)
2) speed: more nodes - less time per single insert/select for one client
3) concurrency: more nodes - better speed for simultaneous inserts/selects from different clients

Upvotes: 2

Views: 2724

Answers (2)

S. Stas
S. Stas

Reputation: 810

You've mentioned that you use time series data. 1. Naturally, you can vary the replication factor and consistency level. So yes, Cassandra would be good as a replacement.
2. The insert would be really fast as Cassandra writes memory first. So yes, Cassandra would be good as a replacement.
3. Cassandra has linear horizontal scalability. So yes, Cassandra would be good as a replacement.
The drawbacks are that Cassandra is a key-value storage. So you should model the table structure around the queries. And PostgreSQL as RDBMS is more flexible as support the whole set of SQL operations.
You can read more about some pros and cons of using Cassandra with time series data here and here.

Upvotes: 2

Mandraenke
Mandraenke

Reputation: 3266

For your points:

1) This is a question which is up to you while choosing the keyspace replication factor RF and the consistency levels CL of your inserts and selects. To be available and consistent you need RF=3 on your and CL.QUORUM for both insert and select for hande loss of one node (for QUORUM you need RF/2+1 nodes online, 3/2+1=2 - integer division, with RF=5 you would neeed 5/2+1=3 nodes online, so you can handle loss of 2).

2) A single request will be handled by a single node as coordinator in your cluster. You do not gain much performance here with singe and synchronous requsts. If you issue any requests and use async you will split your requests across more nodes and gain performance.

3) With more clients you have the same effect - the coordinator will be picked at random (ok there is the TokenAwarePolicy which will pick a appropriate coordinator).

Upvotes: 2

Related Questions