Reputation: 67
I have created a TABLE and index As follows
CREATE TABLE refresh_token ( user_id bigint, refresh_token text, access_token text, device_desc text, device_type text, expire_time timestamp, org_id bigint, PRIMARY KEY (user_id, refresh_token) ) WITH CLUSTERING ORDER BY (refresh_token ASC) CREATE INDEX i_access_token ON demodb.refresh_token (access_token);
After i insert or delete data about millions times.I'm found when i user the follow query can not return any data. Actually,there has this row in the data.
when i query by PRIMARY KEY
select * from refresh_token where user_id=405198 and refresh_token='E82B57D9D64BECDBD6B5602A72816BD19016323504F803116F66A32598E04298';
it returns data:
select * from refresh_token where user_id=405198 and refresh_token='E82B57D9D64BECDBD6B5602A72816BD19016323504F803116F66A32598E04298'; user_id | refresh_token | access_token | device_desc | device_type | expire_time | org_id ---------+------------------------------------------------------------------+------------------------------------------------------------------+-------------+-------------+--------------------------+-------------- 405198 | E82B57D9D64BECDBD6B5602A72816BD19016323504F803116F66A32598E04298 | E82B57D9D64BECDB16D4F3F9F81AC0EF7AF2C4B460CB0F33C9CEFA5846BA7BE1 | null | null | 2016-06-07 14:09:52+0800 | 481036337156
but when i query by secondary index,it return null.
select * from refresh_token where access_token ='E82B57D9D64BECDB16D4F3F9F81AC0EF7AF2C4B460CB0F33C9CEFA5846BA7BE1'; user_id | refresh_token | access_token | device_desc | device_type | expire_time | org_id ---------+---------------+--------------+-------------+-------------+-------------+--------
thanks
Upvotes: 1
Views: 517
Reputation: 9475
Secondary indexes are suggested only for fields with low cardinality. Your access_token field looks like it has very high cardinality (and may even be unique for all million rows). This is a known anti pattern in Cassandra.
High cardinality fields are good for things like partition keys because they will hash to a known location. But secondary indexes are not hashed and are found via local data structures on each node. These local data structures become cumbersome and inefficient when there are a lot of different values being indexed. I suspect you are hitting an internal timeout before the node with the matching access_token is finding the needle in the haystack.
If you need to find data by access_token, I'd suggest creating a second table where access_token is the partition key and use it to look up the corresponding user_id and refresh_token. That way you will be using access_token as a hash and will get reliable and quick look ups.
Upvotes: 1