Reputation: 2911
Performance issue with arangosearch
I have document collection like:
{
"passage": "Some long text",
"meta": {
"language": "en",
"Region":"Asia Pacific"
},
"document_name": "my document.pdf"
}
Now, to enable full-text search I created a view and link configuration like:
"links": {
"my_coll": {
"analyzers": [
"myAnalyzer"
],
"fields": {
"passage": {"analyzers": [
"myAnalyzer"
]}
},
"includeAllFields": false,
"storeValues": "none",
"trackListPositions": false
}
}
Now I want to search from the passage but for particular language and region
My query like:
LET token = tokens("My text to be search", "myAnalyzer")
for docs in my_vw
search analyzer(token any == docs.passage, "myAnalyzer")
filter docs.meta.language=="en"
filter docs.meta.Region=="Global"
sort BM25(docs) desc
limit 50
return {passage: docs.passage, score: BM25(docs)}
This query is taking around 4sec to answer. there are 3,227,261 documents in the collection.
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
3 EnumerateViewNode 3227261 - FOR docs IN my_vw SEARCH ANALYZER(([ "my", "token" ] any == docs.`passage`), "myAnalyzer") LET #10 = BM25(docs) /* view query */
4 CalculationNode 3227261 - LET #2 = ((docs.`meta`.`language` == "en") && (docs.`meta`.`Region` == "myAnalyzer")) /* simple expression */
5 FilterNode 3227261 - FILTER #2
9 SortNode 3227261 - SORT #10 DESC /* sorting strategy: constrained heap */
10 LimitNode 50 - LIMIT 0, 50
11 CalculationNode 50 - LET #8 = { "passage" : docs.`passage`, "score" : #10 } /* simple expression */
12 ReturnNode 50 - RETURN #8
It is selecting all the documents first and then applying filters. Is there any way to apply the filter first and then search?
Can you help to improve this query performance?
Upvotes: 0
Views: 191
Reputation: 141
I suggest you to avoid post-filtering.
You'd better to index meta.language
and meta.language
fields with the adjusted definition:
"links": {
"my_coll": {
"analyzers": [
"myAnalyzer"
],
"fields": {
"passage": {"analyzers": [ "myAnalyzer" ]},
"meta" : { "fields" : { "language":{}, "Region":{} } }
},
"includeAllFields": false,
"storeValues": "none",
"trackListPositions": false
}
}
Then you can transform your query to:
LET token = tokens("My text to be search", "myAnalyzer")
for docs in my_vw
search analyzer(token any == docs.passage, "myAnalyzer")
AND docs.meta.language=="en"
AND docs.meta.Region=="Global"
sort BM25(docs) desc
limit 50
return {passage: docs.passage, score: BM25(docs)}
Upvotes: 1