Reputation: 658
I am currently working with the following setup:
My challenge involves performing a vector search across a large number of IDs, ranging from 10,000 to 500,000. For instance, if I have a query that matches 100,000 documents but only 70,000 of those are available in my inventory. I need to filter out the available items based on this information which I have in the form of a bitset.
The current workflow is as follows:
Query Milvus: Execute a query within Milvus to retrieve document IDs for matched vectors without availability consideration.
Post-Processing: Implement post-filtering using the bitset to isolate only those documents with their corresponding availability bit set to '1'. This results in an O(n) computational complexity, where n is the size of the matched documents.
I am seeking a method or strategy that either reduces this operation to O(1) time complexity or allows me to directly fetch available items via Milvus without needing any post-filtering.
Milvus does support filtering with bitsets (applying filters prior to running the actual approximate nearest neighbor (ANN) search). As I already have an availability bitset on hand, is there any way to utilize this within Milvus?
I appreciate any guidance or suggestions on how this can be achieved efficiently.
Upvotes: 1
Views: 178
Reputation: 21
You can directly filtering with expr inventoryid in [1,2,3,5,8...] I'm pretty sure filtering on such a list will not be efficient(Much better if you can do range inventoryid> 100).
You can not mmaping the bitset to the data outside milvus because bitset offset is segment level and user have no idea about how data is splited into multiple segments
Upvotes: 0
Reputation: 71
will you change the bitset by query patterns? If you have a static bitset map(set available ones as "1"), I think you can store it as scalar field. Milvus's new version will intro new index type - bitset map, which seems can handle your case.
Upvotes: 1