Xingdi
Xingdi

Reputation: 81

milvus hybrid search cannot find record I want

I am using milvus for a vector dababase. My data field is like [mailno, address, embedding], where mailno and address are string type and embedding is a float vector. I insert about 1300k records into the collection without any partition of the data, build index and load the index.

  1. Firstly I tried to search a vector, the return hit is too far away from the query and the distance value is also not small. Because my data is address data, it is easy to identify the similarity between two address string.
  2. Then I tried to find some record which is close to the query (e.g., same city, town, road etc) and use the 'mailno like "xxx"' as query to run a hybrid search. My purpose is to limit the search to the special record, so that I can check its distance value. However, the query filter is not working, the return hit is the same to the one without the query filter.
  3. I run the query alone, the correct record is returned, this means the query params are correct.

Now I am confused, why the hybrid search is not working? Is it because the dataset is too large? I have tried the search on a small dataset, the embedding and distance is ok. The embedding algorithm takes into account of the geo information.

Upvotes: 0

Views: 486

Answers (2)

ken zhang
ken zhang

Reputation: 71

Now Milvus 2.4 support fuzzy match as well

filter="mailno like '%xxx%'"

Upvotes: 1

Christy
Christy

Reputation: 66

Try "... like 'prefix%' "

filter="mailno like 'xxx%'",

More info: https://milvus.io/blog/2022-08-08-How-to-use-string-data-to-empower-your-similarity-search-applications.md

The fastest metadata text filtering will be prefix matching or my_string in list, "my_varchar_metada_column in ['list', 'of', 'strings'] ".

More examples in this bootcamp, see https://github.com/milvus-io/bootcamp/blob/master/bootcamp/RAG/readthedocs_zilliz_langchain.ipynb > Cell#14

Upvotes: 0

Related Questions