Tirion
Tirion

Reputation: 147

How to use query to differentiate whether a field doesn't exist or it exists but its value is an empty array?

What I want to achieve is that:

  1. When a field doesn't exist, this document should be returned.
  2. When a field exists but its value is an empty array, this document should NOT be returned.

It seems that I cannot use exists query because it doesn't differentiate the two cases.

Should I use something like below?

GET /_search
{
  "query": {
    "bool": {
      "must_not": {
        "exists": {
          "field": "weekly_hours.monday_hours"
        }
      },
      "filter": {
        "script": {
          "script": {
            "source": "doc['weekly_hours.monday_hours'].size != 0",
            "lang": "painless"
          }
        }
      }
    }
  }
}

This way, both the situations mentioned above will return true in the must_not section, then in the filter the 2nd situation will be filtered out as its size is 0.

Is that correct?

Is there any simpler way to do it?

Thanks in advance!

Update:

I tried the script mentioned above. I guess the idea works but an exception Fielddata is not supported on field [weekly_hours.monday_hours] of type [date_range] occurred, as shown below.

{
    "took": 12,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 4,
        "failed": 1,
        "failures": [
            {
                "shard": 3,
                "index": "items",
                "node": "jRtjvF1mTzG_jPTqkDo2uA",
                "reason": {
                    "type": "script_exception",
                    "reason": "runtime error",
                    "script_stack": [
                        "org.elasticsearch.index.mapper.MappedFieldType.fielddataBuilder(MappedFieldType.java:103)",
                        "org.elasticsearch.index.fielddata.IndexFieldDataService.getForField(IndexFieldDataService.java:111)",
                        "org.elasticsearch.search.lookup.LeafDocLookup$1.run(LeafDocLookup.java:87)",
                        "org.elasticsearch.search.lookup.LeafDocLookup$1.run(LeafDocLookup.java:84)",
                        "java.security.AccessController.doPrivileged(Native Method)",
                        "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:84)",
                        "doc['weekly_hours.monday_hours'].size() != 0",
                        "    ^---- HERE"
                    ],
                    "script": "doc['weekly_hours.monday_hours'].size() != 0",
                    "lang": "painless",
                    "caused_by": {
                        "type": "illegal_argument_exception",
                        "reason": "Fielddata is not supported on field [weekly_hours.monday_hours] of type [date_range]"
                    }
                }
            }
        ]
    },
    "hits": {
        "total": 0,
        "max_score": null,
        "hits": []
    }
}

The monday_hours field is of type date_range. Does that mean I cannot use script on this field at all? If so, how to achieve the effect I described at the beginning of the question? Really need help here...

Upvotes: 2

Views: 882

Answers (1)

Joe - Check out my books
Joe - Check out my books

Reputation: 16933

Doing a must_not on weekly_hours.monday_hours would technically filter out all the missing docs so calling doc['weekly_hours.monday_hours'].size() != 0 wouldn't have any further effect. I said technically because ES does not treat all missing values the same -- I'll elaborate below.

What's worse is that date_range fields currently (>=7.9) don't support the null_value parameter which is already possible in other field types and makes one's life easier in situations like these.

I tested a few variations of emptyish fields:

1.
{
  "weekly_hours":{
    
  }
}

2.
{
  "weekly_hours":{
    "monday_hours":{
      
    }
  }
}

3.
{
 "weekly_hours":{
    "monday_hours":{
      "gte":null,
      "lte":null
    }
  }
}

and finally a valid one

{
  "weekly_hours":{
    "monday_hours":{
      "gte":"2015-10-31 12:00:00",
      "lte":"2015-11-01"
    }
  }
}

Now, I'd expect the following query to return only 4. but it does not and I think it's worth filing a bug report:

{
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "weekly_hours.monday_hours"
          }
        },
        {
          "range": {
            "weekly_hours.monday_hours": {
              "gte": "1970-01-01",
              "lte": "2999-12-31"
            }
          }
        },
        {
          "query_string": {
            "query": "_exists_:weekly_hours AND _exists_:weekly_hours.monday_hours AND weekly_hours.monday_hours:[1970-01-01 TO 2999-12-31] AND NOT weekly_hours.monday_hours:[null TO null]"
          }
        }
      ]
    }
  }
}

One final remark: wouldn't it be simpler to use integer_ranges instead of date_ranges for day-of-week hours? Now, given that all these ranges belong to the same field type family, I'd go ditch them for now and follow this neat opening hours approach.

Upvotes: 1

Related Questions