Reputation: 1018
I have an index that contains documents who have a status. These are initially imported with a job and their status is set to 0.
For simplicity:
{
"_uid" : 1234
"id" : 1
"name" : "someName",
"status" : 0
}
Then another import job runs and extends these objects by iterating over each object with status=0. Each object that is extended gets the status 1.
{
"_uid" : 1234
"id" : 1
"name" : "someName",
"newProperty" : "someValue",
"status" : 1
}
(Note the unchanged _uid. It's the same object)
Now I have a third import job that takes all objects with status one, takes their ID (the ID!!! Not their _uid!) and creates a new object with the same ID, but different UID:
{
"_uid" : 5678
"id" : 1
"completelyDifferentProperty" : "someValue"
"status" : 2
}
So now, for each ID, I have two objects: One with status = 1, One with status = 2.
For the last job I need to make sure that it only picks objects with status =1 that DO NOT YET have a corresponding status=2 object.
So I need a query to the effect of "Get all objects where status == 1 for which no status == 2 object with the same ID exists".
I have a feeling aggregations might help me but I haven't gotten it figured out yet.
Upvotes: 2
Views: 345
Reputation: 8718
You can do it fairly easily with a parent/child relationship. This is sort of a special-case use of the capability, but I think it could be used to solve your problem.
To test it out, I set up an index like this, with parent_doc
type and a child_doc
type (I only included the properties necessary to set up the capability; it doesn't hurt to add more in your documents):
PUT /test_index
{
"mappings": {
"parent_doc": {
"_id": {
"path": "id"
},
"properties": {
"id": {
"type": "long"
},
"_uid": {
"type": "long"
},
"status": {
"type": "integer"
}
}
},
"child_doc": {
"_parent": {
"type": "parent_doc"
},
"_id": {
"path": "id"
},
"properties": {
"id": {
"type": "long"
},
"_uid": {
"type": "long"
},
"status": {
"type": "long"
}
}
}
}
}
Then I added four docs; three parents, one child. There is one document that has "status: 1
that doesn't have a corresponding child document.
POST /test_index/_bulk
{"index":{"_type":"parent_doc"}}
{"_uid":1234,"id":1,"name":"someName","newProperty":"someValue","status":0}
{"index":{"_type":"parent_doc"}}
{"_uid":1234,"id":2,"name":"someName","newProperty":"someValue","status":1}
{"index":{"_type":"child_doc","_parent":2}}
{"_uid":5678,"id":2,"completelyDifferentProperty":"someValue","status":2}
{"index":{"_type":"parent_doc"}}
{"_uid":4321,"id":3,"name":"anotherName","newProperty":"anotherValue","status":1}
We can find the document we want like this; notice we are querying only the parent_doc
type, and that our conditions are that status
is 1
and no child (at all) exists:
POST /test_index/parent_doc/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"term": {
"status": 1
}
},
{
"not": {
"filter": {
"has_child": {
"type": "child_doc",
"query": {
"match_all": {}
}
}
}
}
}
]
}
}
}
}
}
This returns:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "parent_doc",
"_id": "3",
"_score": 1,
"_source": {
"_uid": 4321,
"id": 3,
"name": "anotherName",
"newProperty": "anotherValue",
"status": 1
}
}
]
}
}
Here's all the code I used to test it:
http://sense.qbox.io/gist/d1a0267087d6e744b991de5cdec1c31d947ebc13
Upvotes: 1