Reputation: 25
How can I return the number of documents that have more than 2 elements in the "words" list with more than 3 words in "word_combination". Is there a way to count the number of words in a string?
Example: return document if (the length of "words" > 2) AND ("words.word_combination" has more than 3 words)
I have many documents stored. One document's structure looks like this:
"_source" : {
"group_words" : [
{
"amount" : 1140,
"words" : [
{
"relevance_score" : 56,
"points" : 66461,
"bits" : 100,
"word_combination" : "cat dog"
},
{
"relevance_score" : 84,
"points" : 45202,
"bits" : 990,
"word_combination" : "cat dog elephant"
},
{
"relevance_score" : 99,
"points" : 30974,
"bits" : 70,
"word_combination" : "elephant cat mouse leopard"
}
],
"group" : "whatever"
},
{
"amount" : 1320,
"words" : [
{
"relevance_score" : 25,
"points" : 53396,
"bits" : 70,
"word_combination" : "lion elephant"
},
{
"relevance_score" : 66,
"points" : 52166,
"bits" : 20,
"word_combination" : "lion mouse fish cat dog"
},
{
"relevance_score" : 82,
"points" : 49316,
"bits" : 810,
"word_combination" : "elephant cat mouse leopard dog lion"
},
{
"relevance_score" : 87,
"points" : 127705,
"bits" : 290,
"word_combination" : "elephant cat mouse leopard tiger lion"
}
],
"group" : "whatever"
},
{
"amount" : 11260,
"words" : [
{
"relevance_score" : 0,
"points" : 37909,
"bits" : 9000,
"word_combination" : "elephant cat mouse leopard tiger lion monkey"
},
{
"relevance_score" : 3,
"points" : 35782,
"bits" : 540,
"word_combination" : "elephant"
}
],
"group" : "whatever"
}
]
}
Upvotes: 1
Views: 44
Reputation: 217274
Regarding the number of elements in the words
array, my advice is to store that number in an additional field words_count
at indexing time.
{
"amount" : 1140,
"words_count": 3, <--- add this
"words" : [
{
"relevance_score" : 56,
"points" : 66461,
"bits" : 100,
"word_combination" : "cat dog"
},
{
"relevance_score" : 84,
"points" : 45202,
"bits" : 990,
"word_combination" : "cat dog elephant"
},
{
"relevance_score" : 99,
"points" : 30974,
"bits" : 70,
"word_combination" : "elephant cat mouse leopard"
}
],
"group" : "whatever"
},
Concerning the number of words (or tokens) in the word_combination
field, there's a data type called token_count
which exists exactly for this purpose. Simply define your mapping like this:
...
"word_combination": {
"type": "text",
"fields": {
"count": {
"type": "token_count",
"analyzer": "standard"
}
}
}
Then in your query you can access word_combination.count
which is going to contain the number of tokens (as analyzed by the specified analyzer) present in the word_combination
field.
Upvotes: 1