Armen Sanoyan
Armen Sanoyan

Reputation: 2042

Get RequestError(400, 'search_phase_execution_exception', 'runtime error') for cossimilarity

I am trying to do semantic search with Elasticsearch using tensorflow_hub, but I get RequestError: RequestError(400, 'search_phase_execution_exception', 'runtime error') . From search_phase_execution_exception I suppose that with corrupted data(from this stack question) My document structure looks like this

{
"settings": {
  "number_of_shards": 2,
  "number_of_replicas": 1
},
 "mappings": {
  "dynamic": "true",
  "_source": {
    "enabled": "true"
  },
  "properties": {
        "id": {
            "type":"keyword"
        },
        "title": {
            "type": "text"
        },
        "abstract": {
            "type": "text"
        },
        "abs_emb": {
            "type":"dense_vector",
            "dims":512
        },
        "timestamp": {
            "type":"date"
        }
    }
}
}

And I create a document using elasticsearch.indices.create.

es.indices.create(index=index, body='my_document_structure')
res = es.indices.delete(index=index, ignore=[404])
for i in range(100):
  doc = {
    'timestamp': datetime.datetime.utcnow(),
    'id':id[i],
    'title':title[0][i],
    'abstract':abstract[0][i],
    'abs_emb':tf_hub_KerasLayer([abstract[0][i]])[0]
  }
  res = es.index(index=index, body=doc)

for my semantic search I use this code

query = "graphene" query_vector = list(embed([query])[0])

script_query = {
    "script_score": {
        "query": {"match_all": {}},
        "script": {
            "source": "cosineSimilarity(params.query_vector, doc['abs_emb']) + 1.0",
            "params": {"query_vector": query_vector}
        }
    }
}

response = es.search(
    index=index,
    body={
        "size": 5,
        "query": script_query,
        "_source": {"includes": ["title", "abstract"]}
    }
)

I know there are some similar questions in stackoverflow and elsasticsearch, but I couldn't find solution for me. My guess is that the document structure is wrong but I can't figure out what exactly. I used search query code from this repo. The full error message is too long and doesn't seem to contain much information, so I share only last part of it.

~/untitled/elastic/venv/lib/python3.9/site-packages/elasticsearch/connection/base.py in 
_raise_error(self, status_code, raw_data)
320             logger.warning("Undecodable raw error response from server: %s", err)
321 
--> 322         raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
323             status_code, error_message, additional_info
324         )

RequestError: RequestError(400, 'search_phase_execution_exception', 'runtime error')

An here is the Error from Elasticsearch server.

[2021-04-29T12:43:07,797][WARN ][o.e.c.r.a.DiskThresholdMonitor] 
[asmac.local] high disk watermark [90%] exceeded on 
[w7lUacguTZWH9xc_lyd0kg][asmac.local][/Users/username/elasticsearch- 
7.12.0/data/nodes/0] free: 17.2gb[7.4%], shards will be relocated 
away from this node; currently relocating away shards totalling [0] 
bytes; the node is expected to continue to exceed the high disk 
watermark when these relocations are complete

Upvotes: 5

Views: 14972

Answers (4)

Konard
Konard

Reputation: 3034

I had a similar issue, because I was using doc['text_vector'] instead of 'text_vector'. (see breaking change at 7.6 version of elasticsearch)

Once I've added json.dumps I found that 'text_vector' field was not a dence_vector, because of this error message:

class org.elasticsearch.index.fielddata.ScriptDocValues$Doubles cannot be cast to class org.elasticsearch.xpack.vectors.query.VectorScriptDocValues$DenseVectorScriptDocValues

And to fix this error, I had to create index with mappings field set to:

{ "properties": { "text_vector": { "type": "dense_vector", "dims": 3 } } }

There dims is a size of vector (number of elements in vector).

A function to create index with mapping of types for fields:

def create():
    index = 'text_index'
    body = {
        "settings": {},
        "mappings": { "properties": { "text_vector": { "type": "dense_vector", "dims": 3 } } }
    }
    es.indices.create(index=index, body=body)

    click.echo(f"Index {index} is created with settings {json.dumps(body, indent=4)}")

A function to index any string of text:

def index(input_str):
    # text_embedding = embed([input_str])[0].numpy().tolist()
    text_embedding = [4.2, 3.4, -0.2]

    body = {'text': input_str, 'text_vector': text_embedding}
    
    res = es.index(index='text_index', body=body)
    click.echo(f"Indexed {input_str} with id {res['_id']}")

A function to execute vector search using elasticsearch for any text string:

def search(search_string):
    # search_vector = embed([search_string])[0].numpy().tolist()
    search_vector = [4.2, 3.4, -0.2]

    body = {
        'query': {
            'script_score': {
                'query': {'match_all': {}},
                'script': {
                    'source': "cosineSimilarity(params.query_vector, 'text_vector') + 1.0",
                    'params': {'query_vector': search_vector}
                }
            }
        }
    }
    try:
        res = es.search(index='text_index', body=body)
        click.echo("Search results:")
        for doc in res['hits']['hits']:
            click.echo(f"{doc['_id']} {doc['_score']}: {doc['_source']['text']}")
    except Exception as inst:
        print(type(inst))
        print(json.dumps(inst.args, inden

Note: this just an example, adjust mappings and embedding vectors that are used to index text and to search text according to your embedding model configuration. If it still does not help then read error carefully at json dump.

Full description of issue: https://github.com/Konard/elastic-search/issues/3

Full source code: https://github.com/Konard/elastic-search/commit/1df0748dd8e8a37c29e1d128eedf96d074e5a73f

Upvotes: 1

BEWARB
BEWARB

Reputation: 131

For me the issue was I was using dense_vector instead of elastiknn_dense_float_vector which is still open issue. I am converting my vector index to use dense_vector instead: https://github.com/alexklibisz/elastiknn/issues/323

Upvotes: 0

Vitaly
Vitaly

Reputation: 11

in my case the error was "Caused by: java.lang.ClassCastException: class org.elasticsearch.index.fielddata.ScriptDocValues$Doubles cannot be cast to class org.elasticsearch.xpack.vect ors.query.VectorScriptDocValues$DenseVectorScriptDocValues"

My mistake was - I removed the ES index before starting ingesting content. The one that had the "type":"dense_vector" field.

It caused ES did not use the correct type for indexing dense vectors: they were stored as useless lists of doubles. In this sense the ES index was 'corrupted': all 'script_score' queries returned 400.

Upvotes: 1

Val
Val

Reputation: 217464

I think you're hitting the following issue and you should update your query to this:

script_query = {
    "script_score": {
        "query": {"match_all": {}},
        "script": {
            "source": "cosineSimilarity(params.query_vector, 'abs_emb') + 1.0",
            "params": {"query_vector": query_vector}
        }
    }
}

Also make sure that query_vector contains floats and not doubles

Upvotes: 3

Related Questions