Reputation: 5175
I created a minimal example of users
TABLE
on Cassandra 2.0.9
database. I can use SELECT to select all its rows, but I do not understand why adding my WHERE
clause (on indexed column) returns 0 rows.
(I also do not get why 'CONTAINS' statement causes an error here, as presented below, but let's assume this is not my primary concern. )
DROP TABLE IF EXISTS users;
CREATE TABLE users
(
KEY varchar PRIMARY KEY,
password varchar,
gender varchar,
session_token varchar,
state varchar,
birth_year bigint
);
INSERT INTO users (KEY, gender, password) VALUES ('jessie', 'f', 'avlrenfls');
INSERT INTO users (KEY, gender, password) VALUES ('kate', 'f', '897q7rggg');
INSERT INTO users (KEY, gender, password) VALUES ('mike', 'm', 'mike123');
CREATE INDEX ON users (gender);
DESCRIBE TABLE users;
Output:
CREATE TABLE users
(
key text,
birth_year bigint,
gender text,
password text,
session_token text,
state text,
PRIMARY KEY ((key))
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.100000 AND
gc_grace_seconds=864000 AND
index_interval=128 AND
read_repair_chance=0.000000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
CREATE INDEX users_gender_idx ON users (gender);
This SELECT
works OK:
SELECT * FROM users;
key | birth_year | gender | password | session_token | state
-------+------------+--------+-----------+---------------+-------
kate | null | f | 897q7rggg | null | null
jessie | null | f | avlrenfls | null | null
mike | null | m | mike123 | null | null
And this one does not:
SELECT * FROM users WHERE gender = 'f';
(0 rows)
This also fails:
SELECT * FROM users WHERE gender CONTAINS 'f';
Bad Request: line 1:33 no viable alternative at input 'CONTAINS'
Upvotes: 1
Views: 1268
Reputation: 57808
It sounds like your index may have become corrupt. Try rebuilding it. Run this from a command prompt:
nodetool rebuild_index yourKeyspaceName users users_gender_idx
However, the larger issue here is that secondary indexes are known to perform poorly. Some have even identified their use as an anti-pattern. DataStax has a document designed to guide you in appropriate use of secondary indexes. And this is definitely not one of them.
creating an index on an extremely low-cardinality column, such as a boolean column, does not make sense. Each value in the index becomes a single row in the index, resulting in a huge row for all the false values, for example. Indexing a multitude of indexed columns having foo = true and foo = false is not useful.
While gender may not be a boolean column, it has the same cardinality. A secondary index on this column is a terrible idea.
If querying by gender is something you really need to do, then you may need to find a different way to model or partition your data. For instance, PRIMARY KEY (state, gender, key)
will allow you to query gender by state.
SELECT * FROM users WHERE state='WI' and gender='f';
That would return all female users from the state of Wisconsin. Of course, that would mean you would also have to query all states individually. But the bottom line, is that Cassandra does not handle queries for low cardinality keys/indexes well, so you have to be creative in how you solve these types of problems.
Upvotes: 2