hard coder
hard coder

Reputation: 5735

Elasticsearch Group by query with custom aggregation logic

I have two fiends in Elasticsearch

URI
BROWSER

I want to find all those URI which were not hit from a particular browser, say chrome I want to write following query

1. Group by URI,
2. find distinct BROWSER set,
3. filter URIs where chrome is not in BROWERS set.

return result.

I have compeleted query for 1st step

{
  "aggs": {
    "uri_agg": {
      "terms": {
        "field": "uri.keyword",
        "size": 50
      },
      "aggs": {
        "not_hit_by_chrome": {

        }
      }
    }
  }
}

I am not getting how to achieve step 2 and 3 in this query.

Upvotes: 0

Views: 828

Answers (1)

Kamal Kunjapur
Kamal Kunjapur

Reputation: 8860

There are basically two ways to achieve this.

Solution 1: Using Elasticsearch DSL

I've simply filtered the documents not having chrome in the browser using Bool Query and post that I've simply made use of two Terms Aggregation for what you are looking for. That way, it would be more efficient as compared to applying filter on aggregation.

Structure of the query would be:

- Bool Query
- Terms Aggregation (Parent for uri)
  - Terms Aggregation (Child for browsers)

Note that I'm assuming both your fields 'uri' and browser are of type keyword

Sample Documents:

POST myindex/mydocs/1
{
  "uri": "www.google.com",
  "browser": "chrome"
}

POST myindex/mydocs/2
{
  "uri": "www.google.com",
  "browser": "firefox"
}

POST myindex/mydocs/3
{
  "uri": "www.google.com",
  "browser": "iexplorer"
}

Query:

POST myindex/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must_not": [
        {
          "match": {
            "browser": "chrome"
          }
        }
      ]
    }
  }, 
  "aggs": {
    "myuri": {
      "terms": {
        "field": "uri",
        "size": 10
      },
      "aggs": {
        "mybrowsers": {
          "terms": {
            "field": "browser",
            "size": 10
          }
        }
      }
    }
  }
}

Response

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "myuri": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "www.google.com",
          "doc_count": 2,
          "mybrowsers": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "firefox",
                "doc_count": 1
              },
              {
                "key": "iexplorer",
                "doc_count": 1
              }
            ]
          }
        }
      ]
    }
  }
}

Solution 2: Using Elasticsearch SQL Access

If you are using xpack and if you would like to achieve this via SQL Access, then your query would translate to simple SQL query as below:

POST /_xpack/sql?format=txt
{
  "query": "SELECT uri, browser, count(1) FROM myindex WHERE browser <> 'chrome' GROUP BY uri, browser"

}

Response

      uri      |    browser    |   COUNT(1)    
---------------+---------------+---------------  
www.google.com |firefox        |1              
www.google.com |iexplorer      |1    

Let me know if this helps!

Upvotes: 1

Related Questions