Bin Wang
Bin Wang

Reputation: 2747

Does duplicated data from denormalization affect the performance of vector searches?

In Cassandra, data are usually denormalized to match the query pattern. However, with vector columns, it means duplication of the same vectors. I know vector similarity search and index is very expensive. So will the large amount of duplication affect performance? Is there any better way to model data with vector columns?

Upvotes: 0

Views: 47

Answers (1)

Erick Ramirez
Erick Ramirez

Reputation: 16353

Vector searches operate on vector embeddings stored in columns with the CQL vector data type. But more importantly, a vector search queries the index of vector data on a single table. Vector searches do not span across multiple indexes nor tables.

Data duplicated across denormalised tables has no bearing on the performance of vector searches in Cassandra since it only queries the index of vectors on a single table.

As a side note, unless you have specific requirements to store vector embeddings of the same columns in different tables then you should avoid duplicating the vector columns. Again, it doesn't matter if there are duplicate copies of a vector column in multiple tables. Its only impact will be increased storage utilisation. Cheers!

Upvotes: 0

Related Questions