Bobby M
Bobby M

Reputation: 11

Why are there occasional empty inner hits on nested kNN search?

I am having some issues using a nested kNN search. The problem is that occasionally, documents will be returned without any inner hits. How is this possible?

Search result (notice the two last hits have empty inner hits):

{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1},
 'hits': {'hits': [{'_id': 'bRkrI4IBuKQL3UqO8DkV',
                    '_index': 'synthetic_data_index',
                    '_score': 2.0406117,
                    '_source': {'nested_object': {'cool_vector_field': [0.2234513587608724, 0.8878394741163076, 0.3087446303001422, 0.5401258921662346, -0.9228053400350715],
                                                  'some_text_field': 'This is nested text for doc number 383'},
                                'non_nested_text': 'This doc is number 383'},
                    'inner_hits': {'nested_object': {'hits': {'hits': [{'_id': 'bRkrI4IBuKQL3UqO8DkV',
                                                                        '_index': 'synthetic_data_index',
                                                                        '_nested': {'field': 'nested_object', 'offset': 0},
                                                                        '_score': 2.0406117,
                                                                        '_source': {'cool_vector_field': [0.2234513587608724,
                                                                                                          0.8878394741163076,
                                                                                                          0.3087446303001422,
                                                                                                          0.5401258921662346,
                                                                                                          -0.9228053400350715],
                                                                                    'some_text_field': 'This is nested text for doc number 383'}}],
                                                              'max_score': 2.0406117,
                                                              'total': {'relation': 'eq', 'value': 1}}}}},
                   {'_id': 'bhkrI4IBuKQL3UqO8DkV',
                    '_index': 'synthetic_data_index',
                    '_score': 2.0406117,
                    '_source': {'nested_object': {'cool_vector_field': [-0.3667193179233421, 0.04664013242577236, -0.4679759075333949, 0.9335512141017783, 0.9847209912260526],
                                                  'some_text_field': 'This is nested text for doc number 384'},
                                'non_nested_text': 'This doc is number 384'},
                    'inner_hits': {'nested_object': {'hits': {'hits': [], 'max_score': None, 'total': {'relation': 'eq', 'value': 0}}}}},
                   {'_id': 'bxkrI4IBuKQL3UqO8DkV',
                    '_index': 'synthetic_data_index',
                    '_score': 2.0406117,
                    '_source': {'nested_object': {'cool_vector_field': [-0.9203098606975535, -0.8629298912981729, -0.4274567965220182, 0.5190442025173878, -0.32420767814040885],
                                                  'some_text_field': 'This is nested text for doc number 385'},
                                'non_nested_text': 'This doc is number 385'},
                    'inner_hits': {'nested_object': {'hits': {'hits': [], 'max_score': None, 'total': {'relation': 'eq', 'value': 0}}}}}],
          'max_score': 2.0406117,
          'total': {'relation': 'gte', 'value': 10000}},
 'status': 200,
 'timed_out': False,
 'took': 3}

Query:

{'query': {'nested': {'inner_hits': {},
                      'path': 'nested_object',
                      'query': {'knn': {'nested_object.cool_vector_field': {'k': 3,
                                                                            'vector': [-0.53387915, -0.14078664, -0.41952186,  0.11891716, -0.30830444]}}},
                      'score_mode': 'max'}},
 'size': 3}

Index settings:

{'mappings': {'properties': {'nested_object': {'properties': {'cool_vector_field': {'dimension': 5,
                                                                                    'method': {'engine': 'nmslib',
                                                                                               'name': 'hnsw',
                                                                                               'parameters': {'ef_construction': 128,
                                                                                                              'm': 24},
                                                                                               'space_type': 'innerproduct'},
                                                                                    'type': 'knn_vector'},
                                                              'some_text_field': {'type': 'text'}},
                                               'type': 'nested'}}},
 'settings': {'index': {'knn': True,
                        'knn.algo_param.ef_search': 100,
                        'refresh_interval': '30s'},
              'number_of_shards': 1}}

Note that this doesn’t happen for every search. This is more prevalent with datasets larger than 10000 docs. It seems like this issue is more prevalent:

Could this be a bug with the kNN plugin? Any help would be greatly appreciated!

Update: After playing around with this problem more, it seems like all the documents retrieved without inner hits for a single query are groups of consecutively indexed documents. No idea why. For example, (I set the ID to correspond to the order the document is indexed):

[‘3395’, ‘3396’, ‘3397’, ‘3398’, ‘3399’, ‘4250’, ‘4251’, ‘4252’, ‘4253’, ‘4254’]

Upvotes: 1

Views: 491

Answers (0)

Related Questions