Reputation: 2427
Is it possible to retrieve the largest document(or just its size) in ElasticSearch with a single query?
The motivation for doing so is to cache returned documents in a MySQL store, so I would like to get an idea of the order of magnitude of largest docs, to decide whether to go with TEXT
, MEDIUMTEXT
or LONGTEXT
.
EDIT: This is on ES 1.3.
Upvotes: 1
Views: 2684
Reputation: 61
My rough quick approach was to create a new temporary index, via reindex, adding a new field with the string representation size:
POST _reindex
{
"source": {
"index": "input_index"
},
"dest": {
"index": "docs_size_index"
},
"script": {
"source": """
HashMap st = ctx._source;
if (st != null){
ctx._source['docsize'] = st.toString().length();
} else {
ctx._source['docsize'] = 0;
}
"""
}
}
And then querying this new temporary index while using sort.
GET docs_size_index/_search
{
"_source": {
"includes": "['docsize']"
},
"sort": [
{
"docsize": {
"order": "desc"
}
}
]
}
The first element will be the biggest doc in your index, which then you can retrieve and get the actual size
curl -XGET "http://localhost:9700/modules/_doc/<DOC_ID>" | json_pp > biggest_doc.json
Upvotes: 0
Reputation: 2118
To the best of my knowledge, there's no such possibility out of the box.
You could, however, try a scripted aggregation, where the value of the aggregation is the sum of the length of all fields (or all fields you care about).
Another option: try setting a script sorting order for the documents. for example:
"sort": {
"_script": {
"script": "doc['field1'].value.size() + doc['field2'].value.size()",
"type": "number",
"order": "desc"
}
}
Upvotes: 1