Reputation: 1143
Is Cassandra a good alternative for Hadoop as a data warehouse where data is append only and all updates in source databases should not overwrite the existing rows in the data warehouse but get appended. Is Cassandra really ment to act as a data warehouse or just as a database to store the results of batch / stream queries?
Upvotes: 5
Views: 6134
Reputation: 810
Cassandra can be used both as a data warehouse(raw data storage) and as a database (for final data storage). It depends more on the cases you want to do with the data.
You even may need to have both Hadoop and Cassandra for different purposes.
Assume, you need to gather and process data from multiple mobile devices and provide some complex aggregation report to the user.
So at first, you need to save data as fast as possible (as new portions appear very often) so you use Cassandra here. As Cassandra is limited in aggregation features, you load data into HDFS and do some processing via HQL scripts (assume, you're not very good at coding but great in complicated SQLs). And then you move the report results from HDFS to Cassandra in a dedicated reports table partitioned by user id.
So when the user wants to have some aggregation report about his activity in the last month, the application takes the id of active user and returns the aggregated result from Cassandra (as it is simple key-value search).
So for your question, yes, it could be an alternative, but the selection strategy depends on the data types and your application business cases.
You can read more information about usage of Cassandra
here
Upvotes: 3