Francesco Abeni
Francesco Abeni

Reputation: 4265

Elasticsearch: how to apply multiple filters to the same value?

Shortly: when a field has multiple values, how can I get only those items where both my filter applies to the SAME value in a multiple-values field?

Details

I have stored in Elasticsearch some items which have a nested field with multiple values, e.g.

"hits": [
    {
        "name": "John",
        "tickets": [
            {
                "color": "green",
                "code": "001"
            },
            {
                "color": "red",
                "code": "002"
            }
        ]
    },
    {
        "name": "Frank",
        "tickets": [
            {
                "color": "red",
                "code": "001"
            },
            {
                "color": "green",
                "code": "002"
            }
        ]
    }
]

Now consider these filters:

...
filter: [
    { terms: { 'tickets.code': '001' } },
    { terms: { 'tickets.color': 'green' } },
]
...

Both items match, because each one of them has at least a ticket with code "001" and each one of them has ticket with color "green".

How do I write my filters so that only the first match, because it has a ticket which has code "001" AND color "green"?

Thank you in advance for any suggestion.

Upvotes: 2

Views: 79

Answers (1)

dshockley
dshockley

Reputation: 1494

Your problem is caused by the fact that Elasticsearch flattens objects. So internally, your data is represented something like this:

{
    "name": "John",
    "tickets.color": ["green", "red"],
    "tickets.code": ["001", "002"]
},
{
    "name": "Frank",
    "tickets.color": ["red", "green"],
    "tickets.code": ["001", "002"]
}

It's impossible to know which color and code are on the same object. (The original source is also stored, in order to be returned when you make a request, but that's not the data that's queried when you search.)

There are two potential solutions here: denormalization, or nested data type. If you can at all get away with it, denormalization is the better choice here, because it's more efficient. If you denormalize your data, you might end up with a representation like this:

{
    "name": "John",
    "ticket": {
        "color": "green",
        "code": "001"
    }
},
{
    "name": "John",
    "ticket": {
        "color": "red",
        "code": "002"
    }

},
{
    "name": "Frank",
    "ticket": {
        "color": "red",
        "code": "001"
    }
},
{
    "name": , "Frank",
    "ticket": {
        "color": "green",
        "code": "002"
    }
}

If you use a nested data type, you'll have to use a mapping something like this:

{
     "ticket": {
         "type": "nested",
         "properties": {
             "color": {"type": "keyword"},
             "code": {"type": "keyword"}
         }
     }
}

Upvotes: 1

Related Questions