deenbandhu
deenbandhu

Reputation: 599

High Read Latency in cassandra

I am using cassandra 2.1.12 on a cluster of three machines each having 32 GB of RAM and 4 core (on Amazon AWS)

I am using all default configuration of cassandra.

I am using it for my website event analysis (timeseries data) having daily data of around 1 GB having a replcation factor of 3.

My data has grown to around 85 GB on each machine now it is giving read latency of around 4.5 s (4000 ms)

My rows are rarely updated So, I am not using LevelOrder Compaction. And my writes are performing well with latency of around .03ms

Edited :

Here is the definition of ColumnFamily :

CREATE TABLE TimeSeriesData(
logyear int,
logmonth int,
logdate int,
logdatetime timestamp,
cookie text,
sessionid text,
...
PRIMARY KEY (logyear, logmonth, logdate, logdatetime, cookie)
) WITH CLUSTERING ORDER BY (logmonth ASC, logdate ASC, logdatetime ASC, cookie ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';

Going by my partition key which is currently logyear. So, my whole data would be in a single partition. Having said that the partitioner is responsible for distributing groups of rows (by partition key) across nodes in the cluster.

In this case would it be one a sinlge node or not?

Also, Why read latency was very poor despite of reading the data from single partition?

Can a single SSTable have multiple partition in it and vice versa?

I am using org.apache.cassandra.dht.RandomPartitioner.
Moverover, what should be the idle partition key for column family as mentioned above with a incremental data of 1GB daily.

Upvotes: 2

Views: 7802

Answers (1)

Jeff Jirsa
Jeff Jirsa

Reputation: 4426

You're posting what you believe to be a single problem, but it's probably far more involved - potentially many different problems, all manifesting as high latency.

The most likely explanation is high garbage collection due to poor data model. However, you've given us very little to go on.

Look at nodetool cfstats - do the latencies in cfstats match the latencies you see? What's the maximum partition size?

Upvotes: 4

Related Questions