Reputation: 165
I was testing node repair on my Cassandra cluster (v3.11.5) while simultaneously stress-testing it with cassandra-stress (v3.11.4). The disk space run out and the repair failed. As a result gossip got disabled on the nodes. Sstables that were being anticompacted got cleaned up (effectively = deleted), which dropped the disk usage by ~half (to ~1.5TB per node) within a minute. And this I understand.
What I do not undestand is what happened next. The sstables started getting continuously compacted into smaller ones and eventually deleted. As a result the disk usage continued to drop (this time slowly), after a day or so it went from ~1.5TB per node to ~50GB per node. The data that was residing in the cluster was randomly generated by the cassandra-stress, so I see no way to confirm whether it's intact, however I find highly unlikely that it is, as the disk usage dropped that much. Also I have no TTL set up (at least that I would know of, might be missing something), so I would not expect the data being deleted. But I believe this is the case.
Anyway, can anyone point me to what is happening?
Table schema:
> desc test-table1;
CREATE TABLE test-keyspace1.test-table1 (
event_uuid uuid,
create_date timestamp,
action text,
business_profile_id int,
client_uuid uuid,
label text,
params text,
unique_id int,
PRIMARY KEY (event_uuid, create_date)
) WITH CLUSTERING ORDER BY (create_date DESC)
AND bloom_filter_fp_chance = 0.1
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.DeflateCompressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
Logs:
DEBUG [CompactionExecutor:7] 2019-11-23 20:17:19,828 CompactionTask.java:255 - Compacted (59ddec80-0e20-11ea-9612-67e94033cb24) 4 sstables to [/data/cassandra/data/test-keyspace1/test-table1-f592e9600b9511eab562b36ee84fdea9/md-3259-big,] to level=0. 93.264GiB to 25.190GiB (~27% of original) in 5,970,059ms. Read Throughput = 15.997MiB/s, Write Throughput = 4.321MiB/s, Row Throughput = ~909/s. 1,256,595 total partitions merged to 339,390. Partition merge counts were {2:27340, 3:46285, 4:265765, }
(...)
DEBUG [CompactionExecutor:7] 2019-11-24 03:50:14,820 CompactionTask.java:255 - Compacted (e1bd7f50-0e4b-11ea-9612-67e94033cb24) 32 sstables to [/data/cassandra/data/test-keyspace1/test-table1-f592e9600b9511eab562b36ee84fdea9/md-3301-big,] to level=0. 114.787GiB to 25.150GiB (~21% of original) in 14,448,734ms. Read Throughput = 8.135MiB/s, Write Throughput = 1.782MiB/s, Row Throughput = ~375/s. 1,546,722 total partitions merged to 338,859. Partition merge counts were {1:12732, 2:42441, 3:78598, 4:50454, 5:36032, 6:52989, 7:21216, 8:34681, 9:9716, }
DEBUG [CompactionExecutor:15] 2019-11-24 03:50:14,852 LeveledManifest.java:423 - L0 is too far behind, performing size-tiering there first
DEBUG [CompactionExecutor:15] 2019-11-24 03:50:14,852 CompactionTask.java:155 - Compacting (85e06040-0e6d-11ea-9612-67e94033cb24) [/data/cassandra/data/test-keyspace1/test-table1-f592e9600b9511eab562b36ee84fdea9/md-3259-big-Data.db:level=0, /data/cassandra/data/test-keyspace1/test-table1-f592e9600b9511eab562b36ee84fdea9/md-3299-big-Data.db:level=0, /data/cassandra/data/test-keyspace1/test-table1-f592e9600b9511eab562b36ee84fdea9/md-3298-big-Data.db:level=0, /data/cassandra/data/test-keyspace1/test-table1-f592e9600b9511eab562b36ee84fdea9/md-3300-big-Data.db:level=0, /data/cassandra/data/test-keyspace1/test-table1-f592e9600b9511eab562b36ee84fdea9/md-3301-big-Data.db:level=0,]
(...)
DEBUG [NonPeriodicTasks:1] 2019-11-24 06:02:50,117 SSTable.java:105 - Deleting sstable: /data/cassandra/data/test-keyspace1/test-table1-f592e9600b9511eab562b36ee84fdea9/md-3259-big
edit:
I performed some additional testing. To my best knowledge there is no TTL set up, see query result straight after cassandra-stress started inserting data:
> SELECT event_uuid, create_date, ttl(action), ttl(business_profile_id), ttl(client_uuid), ttl(label), ttl(params), ttl(unique_id) FROM test-table1 LIMIT 1;
event_uuid | create_date | ttl(action) | ttl(business_profile_id) | ttl(client_uuid) | ttl(label) | ttl(params) | ttl(unique_id)
--------------------------------------+---------------------------------+-------------+--------------------------+------------------+------------+-------------+----------------
00000000-001b-adf7-0000-0000001badf7 | 2018-01-10 10:08:45.476000+0000 | null | null | null | null | null | null
So neither TTL nor tombstones deletion should be related to the issue. It's likely that there are no duplicates, as the data is highly randomized. No Replication Factor changes were made, as well.
What I found out is that the data volume decrease starts every time after cassandra-stress gets stopped. Sadly, still don't know the exact reason.
Upvotes: 0
Views: 406
Reputation: 2196
I guess, when you think of it from a Cassandra perspective there really are only a few options on why your data shrinks: 1) TTL expired past GC Grace 2) Deletes past GC grace 3) The same records exists in multiple sstables (i.e. "updates") 4) Change in RF to a lower number (essentially a "cleanup" - token reassignment)
In any of the above cases when compaction runs it will either remove or reconcile records which could shrink up the space consumption. Without having the sstables around any more, it's hard to determine which, if not a combination of the above, occurred.
-Jim
Upvotes: 0