Reputation: 497
I have stored my data in arangoDB 2.7 in the following format:
{"content": "Book.xml", "type": "string", "name": "name", "key": 102}
{"content": "D:/XMLexample/Book.xml", "type": "string", "name": "location", "key": 102}
{"content": "xml", "type": "string", "name": "mime-type", "key": 102}
{"content": 4130, "type": "string", "name": "size", "key": 102}
{"content": "Sun Aug 25 07:53:32 2013", "type": "string", "name": "created_date", "key": 102}
{"content": "Wed Jan 23 09:14:07 2013", "type": "string", "name": "modified_date", "key": 102}
{"content": "catalog", "type": "tag", "name": "root", "key": 102}
{"content": "book", "type": "string", "name": "tag", "key": 103}
{"content": "bk101", "type": {"py/type": "__builtin__.str"}, "name": "id", "key": 103}
{"content": "Gambardella, Matthew", "type": {"py/type": "__builtin__.str"}, "name": "author", "key": 1031}
{"content": "XML Developer's Guide", "type": {"py/type": "__builtin__.str"}, "name": "title", "key": 1031}
{"content": "Computer", "type": {"py/type": "__builtin__.str"}, "name": "genre", "key": 1031}
{"content": "44.95", "type": {"py/type": "__builtin__.str"}, "name": "price", "key": 1031}
{"content": "2000-10-01", "type": {"py/type": "__builtin__.str"}, "name": "publish_date", "key": 1031}
{"content": "An in-depth look at creating applications with XML.", "type": {"py/type": "__builtin__.str"}, "name": "description", "key": 1031}
As in am increasing the number of documents as 1000, 10000,100000, 1000000, 10000000 and so on.. The average query response time get increases with the increment in number of documents and varies from 0.2 sec to 3.0 seconds. I have created the Hash index over this collection. My question is whether we can reduces this with the increment in no of documents.
On the other hand, I have also created a Full text index on content component, same thing get happen in full text search, and the response time gets vary from .05 sec to 0.3 sec.
So tell me is there any way to reduce this time further..
Please tell me we can further reduce the response time?
Upvotes: 1
Views: 128
Reputation: 6067
One can not utilize indices in the first level of nested FOR
statements.
However, starting with ArangoDB 2.8 you can utilize array indices:
The values you query are data.pname[*].name
and data.pname[*].type
so lets create indices for them:
db.DSP.ensureIndex({type:"hash", fields: ['data[*].type']});
db.DSP.ensureIndex({type:"hash", fields: ['data[*].name']});
and now lets re-formulate the query so it can utilize this index. We start with a simple version to experiment and use explain to revalidate it actually uses the index:
db._explain('FOR k IN DSP FILTER "modified_date" IN k.data[*].name RETURN k')
Query string:
FOR k IN DSP FILTER "modified_date" IN k.data[*].name RETURN k
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
6 IndexNode 1 - FOR k IN DSP /* hash index scan */
5 ReturnNode 1 - RETURN k
Indexes used:
By Type Collection Unique Sparse Selectivity Fields Ranges
6 hash DSP false false 100.00 % [ `data[*].name` ]
("modified_date" in k.`data`[*].`name`)
So we see we can filter on the array conditions so you only get the documents you want to inspect into the inner loop:
FOR k IN DSP FILTER "modified_date" IN k.data[*].name || "string" IN k.data[*].type
FOR p IN k.data FILTER p.name == "modified_date" || p.type == "string" RETURN p
Upvotes: 1