Daniel Revell
Daniel Revell

Reputation: 8586

Elasticsearch COUNT of DISTINCT in GROUP BY

There are a few questions similar to this, but not quite the same thing. Feels like I'm diving straight into the deep end with Elastic queries and would appreciate some advice.

This is the SQL, and I can't quite figure out how to query this type of aggregation.

SELECT COUNT(DISTINCT session_id), event_type
FROM events
GROUP BY event_type

I'd like to get a count for each event type, but only unique for a given user session.

e.g. Only count the event 'page-view' once for each user session, effectively unique page views.

Upvotes: 3

Views: 7955

Answers (1)

Joe - Check out my books
Joe - Check out my books

Reputation: 16925

If your index mapping defines both the event_type and the session_id as keywords:

PUT /events
{
  "mappings": {
    "properties": {
      "event_type": {
        "type": "keyword"
      },
      "session_id": {
        "type": "keyword"
      }
    }
  }
}

you could use the following aggregation query:

POST /events/_search?filter_path=aggregations.*.buckets.key,aggregations.*.buckets.distinct_session_id
{
  "size": 0,
  "aggs": {
    "group_by_event_type": {
      "terms": {
        "field": "event_type",
        "size": 10
      },
      "aggs": {
        "distinct_session_id": {
          "cardinality": {
            "field": "session_id"
          }
        }
      }
    }
  }
}

which'd yield:

{
  "aggregations" : {
    "group_by_event_type" : {
      "buckets" : [
        {
          "key" : "page_view",
          "distinct_session_id" : {
            "value" : 2
          }
        }, 
        { 
          ...
        }
      ]
    }
  }
}

Note that "size": 10 is a default setting essentially equivalent to SQL's LIMIT clause.

Speaking of SQL, Elasticsearch supports SQL access too, albeit only within an X-Pack license. When you have one, you can do:

GET _sql?format=txt
{
  "query": """
    SELECT COUNT(DISTINCT session_id) distnct_sessions, event_type
    FROM events
    GROUP BY event_type
  """
}

which'd yield

distnct_sessions|  event_type   
----------------+---------------
2.0             |page_view      
...

Upvotes: 6

Related Questions