Reputation: 5132
I would like to store an array of dense_vector
in my document but this does not work as it does for other data types eg.
PUT my_index
{
"mappings": {
"properties": {
"my_vectors": {
"type": "dense_vector",
"dims": 3
},
"my_text" : {
"type" : "keyword"
}
}
}
}
PUT my_index/_doc/1
{
"my_text" : "text1",
"my_vector" : [[0.5, 10, 6], [-0.5, 10, 10]]
}
returns:
'1 document(s) failed to index.',
{'_index': 'my_index', '_type': '_doc', '_id': 'some_id', 'status': 400, 'error':
{'type': 'mapper_parsing_exception', 'reason': 'failed to parse', 'caused_by':
{'type': 'parsing_exception',
'reason': 'Failed to parse object: expecting token of type [VALUE_NUMBER] but found [START_ARRAY]'
}
}
}
How do I achieve this? Different documents will have a variable number of vectors but never more than a handful.
Also, I would then like to query it by performing a cosineSimilarity
for each value in that array. The code below is how I normally do it when I have only one vector in the doc.
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "(1.0+cosineSimilarity(params.query_vector, doc['my_vectors']))",
"params": {"query_vector": query_vector}
}
}
Ideally, I would like the closest similarity or an average.
Upvotes: 7
Views: 12377
Reputation: 2273
I got to this post by attempting to have a set of vectors in my document.
When I do this:
"mappings": {
"properties": {
"vectors": {
"type": "nested",
"properties": {
"vector": {
"type": "dense_vector",
"dims": 768,
"index": "true",
"similarity": "cosine"
}
}
},
"my_text" : {
"type" : "keyword"
}
}
}
I get:
BadRequestError: BadRequestError(400, 'illegal_argument_exception', "[dense_vector] fields cannot be indexed if they're within [nested] mappings")
If I remove the index: true
and "similarity": "cosine"
then the problem goes away (but I won't be able to use knn which is my main goal).
Hopefully this helps someone.
Upvotes: 0
Reputation: 146
The dense_vector
datatype expects one array of numeric values per document like so:
PUT my_index/_doc/1
{
"my_text" : "text1",
"my_vector" : [0.5, 10, 6]
}
To store any number of vectors, you could make the my_vector
field a "nested" type which would contain an array of objects, and each object contains a vector:
PUT my_index
{
"mappings": {
"properties": {
"my_vectors": {
"type": "nested",
"properties": {
"vector": {
"type": "dense_vector",
"dims": 3
}
}
},
"my_text" : {
"type" : "keyword"
}
}
}
}
PUT my_index/_doc/1
{
"my_text" : "text1",
"my_vector" : [
{"vector": [0.5, 10, 6]},
{"vector": [-0.5, 10, 10]}
]
}
EDIT
Then, to query the documents, you can use the following (as of ES v7.6.1)
{
"query": {
"nested": {
"path": "my_vectors",
"score_mode": "max",
"query": {
"function_score": {
"script_score": {
"script": {
"source": "(1.0+cosineSimilarity(params.query_vector, 'my_vectors.vector'))",
"params": {"query_vector": query_vector}
}
}
}
}
}
}
}
Few things to note:
nested
declaration (due to using nested objects to store the vectors)score_mode
to change the scoring behavior. For your case, "max" will score based on largest cosine similarity score which describes documents that are most similar.inner_hits
.Upvotes: 13
Reputation: 7221
The dense_vector
datatype is meant to
stores dense vectors of float values (from documentation) .... A dense_vector field is a single-valued field.
In your example, you want to index multiple vectors in the same property. But as said in the documentation your field must be single-valued. If you have multiple vectors for your document they need to be dispatched in different properties.
No workaround :(
So you need to dispatch vectors in different fields then use a loop in your script and keep the most suited value.
Upvotes: 0