Reputation: 1057
Is it possible to obtain a records updated data from Arangodb commit logs if there is such a thing as commit logs. We have a couple of documents which where update but we did update their modified date field. We however would like to retrieve all updated/changed documents since a certain date.
Upvotes: 1
Views: 804
Reputation: 9097
There are two solutions for this:
Solution one:
The first solution is to not use the commit log, but run an AQL query on the collection and filter on the modified
date field. This will be efficient if there is a sorted index (i.e. skiplist index) on the modified
field.
An example setup for this can be found in the following setup script, which populates a collection test
with 50K documents with random modification dates:
/* use some fixed base date to make query produce results */
var baseDate = 1478779081650; /* 2016-11-10T11:58:01.650Z */
db._create("test");
db.test.ensureIndex({ type: "skiplist", fields: [ "modified" ]});
/* create 50,000 documents with modified dates between
2016-11-10T11:58:01.650Z and up to two years in the past */
for (var i = 0; i < 50000; ++i) {
db.test.insert({ value: i, modified: new Date(baseDate - Math.floor(Math.random() * 1000 * 60 * 60 * 24 * 365 * 2)).toISOString() });
}
Then using AQL it's straight-forward to find documents with a modified
date higher than a specific value:
var query = "FOR doc IN test FILTER doc.modified >= @date RETURN doc";
/* find all documents modified since 2016-11-09T12:00:00.000Z */
var docs = db._query(query, { date: "2016-11-09T12:00:00.000Z" }).toArray();
require("internal").print(docs);
It's also possible to do queries on date ranges, e.g.
var query = "FOR doc IN test FILTER doc.modified >= @from && doc.modified <= @to RETURN doc";
var docs = db._query(query, { from: "2016-11-09T00:00:00.000Z", to: from: "2016-11-09T23:59:59.999Z" }).toArray();
require("internal").print(docs);
Solution two:
The second solution is to use the WAL change log that ArangoDB also exposes via its HTTP API. But this is much more complicated and requires keeping state on the client side.
The basic idea is to query the WAL change log API at /_api/replication/logger-follow
for the given collection. This API call can be given an initial tick value. This controls from where in the change log the request will start looking. In the beginning this tick value is unclear, so simply omit it.
Using curl, the call for the collection test
would be:
curl -X GET "http://127.0.0.1:8529/_db/_system/_api/replication/logger-follow?collection=test" --basic --user "root:" --dump -
All call to this API will produce some HTTP headers with state information and the WAL entries for the collection in chronological order, e.g.
...
X-Arango-Replication-Checkmore: true
X-Arango-Replication-Lastincluded: 6103060
X-Arango-Replication-Lasttick: 6251758
...
{"tick":"6101295","type":2000,"database":"1","cid":"6101294","cname":"test","data":"cid":"6101294","deleted":false,"doCompact":true,"indexBuckets":8,"isSystem":false,"isVolatile":false,"maximalSize":33554432,"name":"test","type":2,"version":5,"waitForSync":false}}
{"tick":"6101298","type":2100,"database":"1","cid":"6101294","cname":"test","data":{"fields":["modified"],"id":"6101297","sparse":false,"type":"skiplist","unique":false}}
{"tick":"6101302","type":2300,"tid":"0","database":"1","cid":"6101294","cname":"test","data":"_id":"test/6101300","_key":"6101300","_rev":"6101300","modified":"2015-06-26T14:18:30.732Z","value":0}}
{"tick":"6101305","type":2300,"tid":"0","database":"1","cid":"6101294","cname":"test","data":"_id":"test/6101304","_key":"6101304","_rev":"6101304","modified":"2016-11-09T07:14:08.146Z","value":1}}
{"tick":"6101308","type":2300,"tid":"0","database":"1","cid":"6101294","cname":"test","data":"_id":"test/6101307","_key":"6101307","_rev":"6101307","modified":"2015-05-14T04:45:01.202Z","value":2}}
...
As can be seen the change log contains not only the insert/update operations for the documents but also the creation of the collection and the creation of the index. It will also contain all remove operations and other operations that change the meta-data of the collection.
Using the change log results, you can now filter them on the client side for type
2300, which is a document insert or update operation, and then peek into data
.modified
of each returned document. You can then use the documents which satisfy your search condition.
Note that the result of the request may not contain all operations, but it may contain only a fraction of them. It may be necessary to fetch more data from the server. This can be done by calling the API again, now using the value of the X-Arango-Replication-Lastincluded
HTTP response header as tick
value, e.g.
curl -X GET "http://127.0.0.1:8529/_db/_system/_api/replication/logger-follow?collection=test&from=6103060" --basic --user "root:" --dump -
This will produce even more operations. You can call the API again and again until it produces no more results and the value of the X-Arango-Replication-Checkmore
HTTP response header becomes false
. That means you have fetched all operations for the time being.
This solution requires the client to potentially issue multiple HTTP requests and keep state (the last fetched tick
value), so it's not as easy to use as the AQL-based solution.
Upvotes: 2