Reputation: 8360
I run a query in cassandra cluster with 5 nodes in cqlsh. It gives me OperationTimedOut error. If I do slight modification in where clause parameter it gives me empty result. This is what is expected. It's ok even if I change a single character of the parameter but exact same parameter value is giving me time out. Why is it so?
query:
select * from table where pid = '5f334fef-2629-484c-a081-c4a6f554c6ab'
here is table schema
CREATE TABLE dmp.interest_data (
pid text,
attribute text,
country text,
day_count int,
first_seen timestamp,
flag int,
keys set<text>,
last_seen timestamp,
score int,
usage_count int,
PRIMARY KEY (pid, attribute)
) WITH CLUSTERING ORDER BY (attribute ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy', 'max_threshold': '32'}
AND compression = {'chunk_length_kb': '256', 'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 172800
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.1
AND speculative_retry = '99.0PERCENTILE';
CREATE INDEX interest_data_attribute_idx ON dmp.interest_data (attribute);
CREATE INDEX interest_data_country_idx ON dmp.interest_data (country);
CREATE INDEX interest_data_day_count_idx ON dmp.interest_data (day_count);
CREATE INDEX interest_data_first_seen_idx ON dmp.interest_data (first_seen);
CREATE INDEX interest_data_usage_count_idx ON dmp.interest_data (usage_count);
Update: Value of pid mentioned in the where clause is supposed to be there in table as it was inserted with a query which didn't give any errors. But when querying it this timeout occurs. Now strange thing happened. I tried deleting it and it got deleted!!! Because after deleting I tried selecting the same and I got empty result. So indeed it was there just that it was in some sort of corrupted form which led to timeout. Now I need to know how something like that can happen
Upvotes: 0
Views: 95
Reputation: 728
Re: Update on delete success and corruption question.
This can certainly happen when you are querying and inserting with consistency level 1 (as mentioned in the comment). assuming a replication factor higher than 1 in the keyspace (usually 3). It might be that a node or two of that key were down/bad during the insert, and sometimes (cluster under load, maintenance issues, etc) - the replication doesn't do it's job and data isn't copied to the replicated nodes.
When this happens, only a repair operation (or nothing at all) can help fix the issue.
The result is there are 1-2 servers who are supposed to hold the row, but don't actually have it, which can cause all kinds of weird failure scenarios.
I don't have a good explanation for the timeout, unless the row has many many columns and it just doesn't finish "in-time"
If this happens again, try using the limit clause (start with 1 and if that works it's probably a very long query and times out naturally.
Upvotes: 0
Reputation: 4031
Check the status of your nodes, changing the value you query for changes the node that owns the value, so most likely one of your nodes is having issues and the value that times out is owned by that node. When you change the value, the new one is owned by a different node, so it doesn't time out.
Upvotes: 1