Performance Discrepancy in Cassandra Vector Similarity Search Queries with and without Filter

Question

I'm observing substantial differences in query performance while executing vector similarity search queries in Cassandra. Here's the context and details:

CREATE TABLE cycling.feature (
    mall_id bigint,
    place_id bigint,
    hardware_id bigint,
    feature_desc_id bigint,
    occur_at timestamp,
    vc vector,
    PRIMARY KEY ((mall_id), place_id, hardware_id, occur_at, feature_desc_id)
) WITH CLUSTERING ORDER BY (place_id ASC, hardware_id ASC, occur_at DESC, feature_desc_id DESC);

CREATE INDEX IF NOT EXISTS feature_ann_index_cos
    ON cycling.feature(vc) USING 'sai'
    WITH OPTIONS = { 'similarity_function': 'cosine' };

With mall_id Filter:

SELECT similarity_cosine(vc, ?) AS sim
FROM cycling.feature
WHERE mall_id = ?
ORDER BY vc ANN OF ? LIMIT 1;

Without mall_id Filter:

SELECT similarity_cosine(vc, ?) AS sim
FROM cycling.feature
ORDER BY vc ANN OF ? LIMIT 1;

The query with the mall_id filter is significantly slower than the one without, even though both are performing vector similarity searches.

I'm expecting the query with the mall_id filter to perform faster than the one without,

Performance Discrepancy in Cassandra Vector Similarity Search Queries with and without Filter

Answers (1)

Related Questions