I am currently working with the following setup: Milvus version 2.3.7, pymilvus version 2.3.6. A database in Milvus containing 4 million 768-dimensional vectors. My challenge involves performing a vector search across a large number of IDs, ranging from 10,000 to 500,000. For instance, if I have a query that matches 100,000 documents but only 70,000 of those are available in my inventory. I need to filter out the available items based on this information which I have in the form of a bitset. The current workflow is as follows: Query Milvus : Execute a query within Milvus to retrieve document IDs for matched vectors without availability consideration. Post-Processing : Implement post-filtering using the bitset to isolate only those documents with their corresponding availability bit set to '1'. This results in an O(n) computational complexity, where n is the size of the matched documents. I am seeking a method or strategy that either reduces this operation to O(1) time complexity or allows me to directly fetch available items via Milvus without needing any post-filtering. Milvus does support filtering with bitsets (applying filters prior to running the actual approximate nearest neighbor (ANN) search). As I already have an availability bitset on hand, is there any way to utilize this within Milvus? I appreciate any guidance or suggestions on how this can be achieved efficiently.

Reputation: 658

Filtering Search Results by Availability in Milvus Using an External Bitset

I am currently working with the following setup:

Milvus version 2.3.7, pymilvus version 2.3.6.
A database in Milvus containing 4 million 768-dimensional vectors.

My challenge involves performing a vector search across a large number of IDs, ranging from 10,000 to 500,000. For instance, if I have a query that matches 100,000 documents but only 70,000 of those are available in my inventory. I need to filter out the available items based on this information which I have in the form of a bitset.

The current workflow is as follows:

Query Milvus: Execute a query within Milvus to retrieve document IDs for matched vectors without availability consideration.
Post-Processing: Implement post-filtering using the bitset to isolate only those documents with their corresponding availability bit set to '1'. This results in an O(n) computational complexity, where n is the size of the matched documents.

I am seeking a method or strategy that either reduces this operation to O(1) time complexity or allows me to directly fetch available items via Milvus without needing any post-filtering.

Milvus does support filtering with bitsets (applying filters prior to running the actual approximate nearest neighbor (ANN) search). As I already have an availability bitset on hand, is there any way to utilize this within Milvus?

I appreciate any guidance or suggestions on how this can be achieved efficiently.

Upvotes: 1

Answers (2)

James Luan

Reputation: 21

You can directly filtering with expr inventoryid in [1,2,3,5,8...] I'm pretty sure filtering on such a list will not be efficient(Much better if you can do range inventoryid> 100).

You can not mmaping the bitset to the data outside milvus because bitset offset is segment level and user have no idea about how data is splited into multiple segments

Upvotes: 0

ken zhang

Reputation: 71

will you change the bitset by query patterns? If you have a static bitset map(set available ones as "1"), I think you can store it as scalar field. Milvus's new version will intro new index type - bitset map, which seems can handle your case.

Upvotes: 1

Filtering Search Results by Availability in Milvus Using an External Bitset

Answers (2)

Related Questions