How to improve the retrieve Query performance in ArangoDB 2.7 with increasing the number of documents within a single collection

Question

I have stored my data in arangoDB 2.7 in the following format:

    {"content": "Book.xml", "type": "string", "name": "name", "key": 102}
    {"content": "D:/XMLexample/Book.xml", "type": "string", "name": "location", "key": 102}
    {"content": "xml", "type": "string", "name": "mime-type", "key": 102}
    {"content": 4130, "type": "string", "name": "size", "key": 102}
    {"content": "Sun Aug 25 07:53:32 2013", "type": "string", "name": "created_date", "key": 102}
    {"content": "Wed Jan 23 09:14:07 2013", "type": "string", "name": "modified_date", "key": 102}
    {"content": "catalog", "type": "tag", "name": "root", "key": 102}
    {"content": "book", "type": "string", "name": "tag", "key": 103} 
    {"content": "bk101", "type": {"py/type": "__builtin__.str"}, "name": "id", "key": 103}
    {"content": "Gambardella, Matthew", "type": {"py/type": "__builtin__.str"}, "name": "author", "key": 1031} 
  {"content": "XML Developer's Guide", "type": {"py/type": "__builtin__.str"}, "name": "title", "key": 1031}
    {"content": "Computer", "type": {"py/type": "__builtin__.str"}, "name": "genre", "key": 1031}
    {"content": "44.95", "type": {"py/type": "__builtin__.str"}, "name": "price", "key": 1031}
    {"content": "2000-10-01", "type": {"py/type": "__builtin__.str"}, "name": "publish_date", "key": 1031}
    {"content": "An in-depth look at creating applications with XML.", "type": {"py/type": "__builtin__.str"}, "name": "description", "key": 1031}

As in am increasing the number of documents as 1000, 10000,100000, 1000000, 10000000 and so on.. The average query response time get increases with the increment in number of documents and varies from 0.2 sec to 3.0 seconds. I have created the Hash index over this collection. My question is whether we can reduces this with the increment in no of documents.

On the other hand, I have also created a Full text index on content component, same thing get happen in full text search, and the response time gets vary from .05 sec to 0.3 sec.

So tell me is there any way to reduce this time further..

Please tell me we can further reduce the response time?

dothebart · Accepted Answer

One can not utilize indices in the first level of nested FOR statements. However, starting with ArangoDB 2.8 you can utilize array indices:

The values you query are data.pname[*].name and data.pname[*].type so lets create indices for them:

db.DSP.ensureIndex({type:"hash", fields: ['data[*].type']});
db.DSP.ensureIndex({type:"hash", fields: ['data[*].name']});

and now lets re-formulate the query so it can utilize this index. We start with a simple version to experiment and use explain to revalidate it actually uses the index:

db._explain('FOR k IN DSP FILTER "modified_date" IN k.data[*].name RETURN k')
Query string:
 FOR k IN DSP FILTER "modified_date" IN k.data[*].name RETURN k

Execution plan:
 Id   NodeType        Est.   Comment
  1   SingletonNode      1   * ROOT
  6   IndexNode          1     - FOR k IN DSP   /* hash index scan */
  5   ReturnNode         1       - RETURN k

Indexes used:
 By   Type   Collection   Unique   Sparse   Selectivity   Fields               Ranges
  6   hash   DSP          false    false       100.00 %   [ `data[*].name` ] 
                                              ("modified_date" in k.`data`[*].`name`)

So we see we can filter on the array conditions so you only get the documents you want to inspect into the inner loop:

FOR k IN DSP FILTER "modified_date" IN k.data[*].name || "string" IN k.data[*].type
  FOR p IN k.data FILTER p.name == "modified_date" || p.type == "string" RETURN p

How to improve the retrieve Query performance in ArangoDB 2.7 with increasing the number of documents within a single collection

Answers (1)

Related Questions