Reputation: 2747
In Cassandra, data are usually denormalized to match the query pattern. However, with vector columns, it means duplication of the same vectors. I know vector similarity search and index is very expensive. So will the large amount of duplication affect performance? Is there any better way to model data with vector columns?
Upvotes: 0
Views: 47
Reputation: 16353
Vector searches operate on vector embeddings stored in columns with the CQL vector
data type. But more importantly, a vector search queries the index of vector data on a single table. Vector searches do not span across multiple indexes nor tables.
Data duplicated across denormalised tables has no bearing on the performance of vector searches in Cassandra since it only queries the index of vectors on a single table.
As a side note, unless you have specific requirements to store vector embeddings of the same columns in different tables then you should avoid duplicating the vector columns. Again, it doesn't matter if there are duplicate copies of a vector column in multiple tables. Its only impact will be increased storage utilisation. Cheers!
Upvotes: 0