filippo
filippo

Reputation: 5793

Searching multiple intexes in cognitive search

So I have a hundred or so inedexes with different information from various sources. all the data is embeded using adda2.

Now I'm trying to iterate over the list of indexes and query each:

index_client = SearchIndexClient(
    endpoint=pierre_itfunds_endpoint, credential=pierre_itfunds_credential
)
index_client.list_indexes()
rows_list = []
for index in indexes:
    search_client = SearchClient(search_service_endpoint, index, credential)  
    vector_query = VectorizedQuery(vector=search_vector, k_nearest_neighbors=3, fields="content_vector")
    results = search_client.search(  
        search_text=query,  
        vector_queries= [vector_query], 
        select=["title", "text"],
        top=3
    )
    for row in results:
        dict1 = {}
        dict1.update({'index':index, 'score':row['@search.score'], 'title':row['title'], 'text':row['text']}) 
        rows_list.append(dict1)
res = pd.DataFrame(rows_list)

then I get the average score for each index:

grouped = res.groupby('index')['score'].agg(['mean'])
grouped

However the resulting avg score doesnt seem consistent:

index_name avg_score
worng-idx 8.3763725667
other1-idx 4.5701991333
other2-idx 4.2485168
other3-idx 3.5756512667
CORRECT-idx 2.5451367667
... ...

I had hopped that using the same embbedder for all the content the cosine distance would be consistent across inedexes... which says nothing about the score, but still.

Is there a way to do this search, or normalize the scores so the highest has the most relevant answers?

Upvotes: 0

Views: 63

Answers (0)

Related Questions