stefanobaldo
stefanobaldo

Reputation: 2063

Elasticsearch aggregate by multiple fields separately

I have an index with 2 fields and some documents, like the following:

city                team
=========================================
New York            New York Knicks
New York            Brooklyn Nets
New Orleans         New Orleans Pelicans

My goal is to provide an automplete that searches on both fields, like this:

Query: [ new                  ]
       +----------------------+
       |     Cities           |
       +----------------------+
       | New York             |
       | New Orleans          |
       +----------------------|
       |     Teams            |
       +----------------------|
       | New York Knicks      |
       | New Orleans Pelicans |
       +----------------------+

My query to filter the documents is quite simple:

"query": {
    "bool": {
        "should": [
            {
                "match_phrase_prefix": {
                    "city": "new"
                }
            },
            {
                "match_phrase_prefix": {
                    "team": "new"
                }
            }
        ]
    }
}

However, I am having trouble with the aggregations. My first approach was:

"aggs": {
    "city": {
        "terms": {
            "field": "city.raw"
        }
    },
    "team": {
        "terms": {
            "field": "team.raw"
        }
    }
}

(raw is a not_analyzed copy of fields for aggregation purposes)

That didn't work because Brooklyn Nets was included in the results - and it should NOT:

"aggregations": {
    "city": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
            {
                "key": "New York",
                "doc_count": 2
            },
            {
                "key": "New Orleans",
                "doc_count": 1
            }
        ]
    },
    "team": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
            {
                "key": "Brooklyn Nets",
                "doc_count": 1
            },
            {
                "key": "New Orleans Pelicans",
                "doc_count": 1
            },
            {
                "key": "New York Knicks",
                "doc_count": 1
            }
        ]
    }
}

I have no idea how get it to work using a single request. This example is just illustrative and in the real scenario I have a lot more fields and documents to search and aggregate, so making multiple request to the server would not be a good idea, especially because an autocomplete system should be as fast as possible.

Any help will be appreciated.

Upvotes: 1

Views: 343

Answers (2)

sanurah
sanurah

Reputation: 1132

Your query,

"query": {
    "bool": {
        "should": [
            {
                "match_phrase_prefix": {
                    "city": "new"
                }
            },
            {
                "match_phrase_prefix": {
                    "team": "new"
                }
            }
        ]
    }
}

returns the document with "City:New York Team:Brooklyn Nets" in reults. Because the "city" field has prefix "new" even though "team" field hasn't.

I think when you use the aggregations the document with "City:New York Team:Brooklyn Nets" gets counted with it. "Team:Brooklyn Nets" document is included in the result set of the query due to the "City:New York" and it gets counted in buckets.

Set minimum_should_match to 2 if you want to check this.

Upvotes: 0

Andrei Stefan
Andrei Stefan

Reputation: 52366

You need a filter aggregation to filter the documents to be aggregated according to your filters in the query itself:

  "aggs": {
    "city": {
      "filter": {
        "bool": {
          "must": [
            {
              "query": {
                "match_phrase_prefix": {
                  "city": "new"
                }
              }
            }
          ]
        }
      },
      "aggs": {
        "cities": {
          "terms": {
            "field": "city.raw"
          }
        }
      }
    },
    "team": {
      "filter": {
        "bool": {
          "must": [
            {
              "query": {
                "match_phrase_prefix": {
                  "team": "new"
                }
              }
            }
          ]
        }
      },
      "aggs": {
        "cities": {
          "terms": {
            "field": "team.raw"
          }
        }
      }
    }
  }

Upvotes: 1

Related Questions