Reputation: 87
I would like to stream data from a cassandra table which is updated in real time. Yes, it is a database but is there a way to do that? If so, keeping an offset or what CQL queries can I use ?
Upvotes: 7
Views: 9549
Reputation:
To stream the data from Cassandra, you want to use the PageSize option like so:
iter := cass.Query(`SELECT * FROM cmuser.users;`).PageSize(100).Iter()
the above is an example with Golang. The description for PageSize is:
PageSize will tell the iterator to fetch the result in pages of size n. This is useful for iterating over large result sets, but setting the page size too low might decrease the performance. This feature is only available in Cassandra 2 and onwards.
Upvotes: 0
Reputation: 1
I understand you were asking specifically about streaming data out of Cassandra, but I would like to suggest that a technology like Apache Kafka sounds like a much better fit for what you're trying to do. It is used by a number of other large companies and has fantastic real-time performance.
There is a seminal blog post by Jay Kreps called The Log: What every software engineer should know about real-time data's unifying abstraction that does a great job of explaining Kafka's purpose and design. A key quote from the blog post summarizes Kafka's role:
Take all the organization's data and put it into a central log for real-time subscription.
Upvotes: 0
Reputation: 16576
Short answer is no.
Long answer is with a lot of difficulty and smart clustering keys you can maybe do that. Basically if you insert data with a clustering key that always increases you can always just scan for clustering keys in a recent time gap. This will of course miss out-of-order inserts outside of your window. This may or may not be good enough for your use case.
Best answer in the future is Change Data Capture: https://issues.apache.org/jira/browse/CASSANDRA-8844
Upvotes: 7