Reputation: 369
I know how to set up the river plugin and search across it. The problem is if the same document is edited multiple times (multiple revisions), the data from oldest revision is retained and older data is lost. I intend to be able keep an index all revisions for my entire couchdb, so I don'thave to keep the history on my couchdb and retrieve history on a doc using elasticsearch and not have to go to the futon. I know the issue will be to uniqely determine a key for a couchdb doc while indexing, but we can append the "revision" number to the key and every key will be unique.
I couldn't find for a way to do that in any documentation. Does anyone have an idea as to how to do it.
Any suggestions/thoughts are welcome.
EDIT 1 : To be more explicit, at the moment elasticsearch saves couchdb docs like this:
"_index": "foo",
"_type": "foo",
"_id": "27fd33f3f51e16c0262e333f2002580a",
"_score": 1.0310782,
"_source": {
"barVal": "bar",
"_rev": "3-d10004227969c8073bc573c33e7e5cfd",
"_id": "27fd33f3f51e16c0262e333f2002580a",
here the _id from couchdb is same as _id for search index. I want the search index to be concat("_id","_rev") from couchdb.
EDIT 2: (after trying out @DaveS solution) So I tried the following, but It didn't work - the search still indexes it based on the couchdb's _id
What I did:
curl -XDELETE 127.0.0.1:9200/_all
curl -XPUT 'localhost:9200/foo_test' -d '{
"mappings": {
"foo_test": {
"_id": {
"path": "newId",
"index": "not_analyzed",
"store": "yes"
}
}
}
}'
curl -XPUT 'localhost: 9200/_river/foo_test/_meta' -d '{
"type": "couchdb",
"couchdb": {
"host": "127.0.0.1",
"port": 5984,
"db": "foo_test",
"script": "ctx.doc.newId = ctx.doc._id + ctx.doc._rev",
"filter": null
},
"index": {
"index": "foo_test",
"type": "foo_test",
"bulk_size": "100",
"bulk_timeout": "10ms"
}
}'
And after this, when I search for a doc I added, I get:
_index: foo_test
_type: foo_test
_id: 53fa6fcf981a01b05387e680ac4a2efa
_score: 8.238497
_source: {
_rev: 4-8f8808f84eebd0984d269318ad21de93
content: {
foo: bar
foo3: bar3
foo2: bar2
}
_id: 53fa6fcf981a01b05387e680ac4a2efa
newId: 53fa6fcf981a01b05387e680ac4a2efa4-8f8808f84eebd0984d269318ad21de93
@DaveS - Hope this helps in explaining that elasticsearch is not not using the new path to define its "_id" field.
EDIT 3 - for @dadoonet. Hope this helps
This is how you get all older rev info for a couchdb. Then you can iterate through the ones available and get their data and index them:
Get a list of all revisions on a doc id:
curl http://:5984/testdb/cde07b966fa7f32433d33b8d16000ecd?revs_info=true {"_id":"cde07b966fa7f32433d33b8d16000ecd", "_rev":"2-16e89e657d637c67749c8dd9375e662f", "foo":"bar", "foo2":"bar2", "_revs_info":[ {"rev":"2-16e89e657d637c67749c8dd9375e662f", "status":"available"}, {"rev":"1-4c6114c65e295552ab1019e2b046b10e", "status":"available"}]}
And then you can retrieve each version by (if the status is available):
curl http://<foo>:5984/testdb/cde07b966fa7f32433d33b8d16000ecd?rev=1-4c6114c65e295552ab1019e2b046b10e
{"_id":"cde07b966fa7f32433d33b8d16000ecd",
"_rev":"1-4c6114c65e295552ab1019e2b046b10e",
"foo":"bar"}
curl http://<foo>:5984/testdb/cde07b966fa7f32433d33b8d16000ecd?rev=2-16e89e657d637c67749c8dd9375e662f
{"_id":"cde07b966fa7f32433d33b8d16000ecd",
"_rev":"2-16e89e657d637c67749c8dd9375e662f",
"foo":"bar",
"foo2":"bar2"}
Upvotes: 1
Views: 812
Reputation: 6419
You might consider adjusting your mapping to pull the _id field from a generated field, e.g. from the docs:
{
"couchdoc" : {
"_id" : {
"path" : "doc_rev_id"
}
}
}
Then "just" modify the river to concatenate the strings and add the result into the document in my_concat_field
. One way to do that might be to use the script filter plugin that the couchdb river provides. E.g. something like this:
{
"type" : "couchdb",
"couchdb" : {
"script" : "ctx.doc.doc_rev_id = ctx.doc._id + '_' + ctx.doc._rev"
}
}
You'd take the above snippit and PUT
it to the river's endpoint, possibly with the rest of the definition, e.g. via curl -XPUT 'localhost:9200/_river/my_db/_meta' -d '<snippit from above>
. Take care to escape the quotes as necessary.
Upvotes: 0
Reputation: 14492
I don't think you can. Just because as far as I remember, CouchDb does not hold the older versions of a document. After a compact, old versions are removed.
That said, even if it was doable in CouchDB, you can not store different versions of a document in Elasticsearch.
To do that, you have to define an ID for the new document: for example: DOCID_REVNUM
That way, new revisions won't update the existing document.
The CouchDB river does not do that by now.
I suggest that you manage that in CouchDB (aka create new docs for each new version of a document) and let the standard CouchDB river index it as another document.
Hope this helps
Upvotes: 2