Reputation: 15
Desired behaviour
I'm trying to configure cassandra cdc in a way that the commitlogsegments are flushed periodically to the cdc_raw directory (let's say every 10 seconds).
Based upon documentation from http://abiasforaction.net/apache-cassandra-memtable-flush/ and from https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/config/configCDCLogging.html I found:
memtable_flush_period_in_ms – This is a CQL table property that specifies the number of milliseconds after which a memtable should be flushed. This property is specified on table creation.
and
Upon flushing the memtable to disk, CommitLogSegments containing data for CDC-enabled tables are moved to the configured cdc_raw directory.
Putting those together I would think that by setting memtable_flush_period_in_ms: 10000
cassandra flushes it's CDC changes to disk every 10 seconds, which is what I want to accomplish.
My configuration
Based upon aforementioned and my configuration I would expect that the memtable gets flushed to the cdc_raw directory every 10 seconds. I'm using the following configuration:
cassandra.yaml:
cdc_enabled: true
commitlog_segment_size_in_mb: 1
commitlog_total_space_in_mb: 2
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
table configuration:
memtable_flush_period_in_ms = 10000
cdc = true
Problem
The memtable is not flushed periodically to the cdc_raw directory, but instead gets flushed to the commitlogs directory when a certain size threshold is reached.
In detail, the following happens:
When a commitlogsegment reaches 1MB, it's flushed to the commitlog directory. There is a maximum of 2 commitlogs in the commitlog directory (see configuration commitlog_total_space_in_mb: 2). When this threshold is reached, the oldest commitlog file in the commitlog directory is moved to the cdc_raw directory.
Question
How to flush Cassandra CDC changes periodically to disk?
Upvotes: 1
Views: 1747
Reputation: 641
Apache Cassandra's CDC in current version is tricky.
Commit log is 'global', meaning changes to any table go to the same commit log.
cdc_raw
directory after every logs in the commit log segment are flushed.So, even you configure your CDC-enabled table to flush every 10 sec, there are logs from other tables still in the same commit log segment, which prevent from moving commit log to CDC directory.
There is no way to change the behavior other than trying to speed up the process by reducing commitlog_segment_size_in_mb
(but you need to be careful not to reduce it to the size smaller than your single write requset).
This behavior is improved and will be released in next major version v4.0. You can read your CDC as fast as commit log is synced to disk (so when you are using periodic
commit log sync, then you can read your change every commit_log_sync_period_in_ms
milliseconds.
See CASSANDRA-12148 for detail.
By the way, you set commitlog_total_space_in_mb
to 2, which I definitely do not recommend. What you are seeing right now is that Cassandra flushes every table when your commit log size exceeded this value to make more space. If you cannot reclaim your commit log space, then Cassandra would start throwing error and rejects writes.
Upvotes: 2