Littlebobbydroptables
Littlebobbydroptables

Reputation: 3741

ElasticSearch aggregation for filtering users who had both events

I've written a query which perfectly return me 6000+ events my users had:

GET /<app_logs-2022.11.23*>/_search
{
  "query": { 
    "bool": {
      "should": [
        {
          "term": {
            "context.identity.type": "login"
          }
        },
        {
          "term": {
            "context.identity.type": "login_error"
          }
        }
      ],
      "minimum_should_match": 1
    }
  },
  "_source": [
    "context.identity.user_id",
    "context.identity.type"
  ],
  "size": 3
}

And i get such set of data

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 15,
    "successful" : 15,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 6001,
    "max_score" : 10.722837,
    "hits" : [
      {
        "_index" : "app_logs-2022.11.23-7",
        "_type" : "app",
        "_id" : "bb469377-0618-49a6-a643-1201dc84c829",
        "_score" : 10.722837,
        "_source" : {
          "context" : {
            "identity" : {
              "user_id" : "72562ad0-4f35-4624-8776-8b555dea851e",
              "type" : "login"
            }
          }
        }
      },
      {
        "_index" : "app_logs-2022.11.23-7",
        "_type" : "app",
        "_id" : "8f4e82a0-f333-4096-bfb6-767fed924093",
        "_score" : 10.722837,
        "_source" : {
          "context" : {
            "identity" : {
              "user_id" : "72562ad0-4f35-4624-8776-8b555dea851e",
              "type" : "login_error"
            }
          }
        }
      },
      {
        "_index" : "app_logs-2022.11.23-7",
        "_type" : "app",
        "_id" : "7090be5a-8b53-4723-a1ac-223476a000f1",
        "_score" : 10.722837,
        "_source" : {
          "context" : {
            "identity" : {
              "user_id" : "75bcb301-1cee-4b3b-aa1b-adbe4c011388",
              "type" : "login_error"
            }
          }
        }
      }
    ]
  }
}

But i can't figure out how to get a number of users who had both login and login_error events, i've tried cardinality aggregation, terms and several more but all of them just split types into buckets showing sum but doesn't group by user, and i want to find how many users had problems first but then managed to login in the end.

The best i've managed to achieve is to get buckets by user_id and output cardinality for each of them by type

  "aggs": {
    "results": {
      "terms": {
        "field": "context.identity.user_id",
        "size": 300
      },
      "aggs": {
        "events": {
          "cardinality": {
            "field": "context.identity.type"
          }
        }
      }
    }
  }

Upvotes: 1

Views: 51

Answers (1)

Musab Dogan
Musab Dogan

Reputation: 3690

I created an example based on the sentence I want to find how many users had problems first but then managed to login in at the end. It works like this:

  1. Make aggs based on user_id
  2. Make sub-aggs by type
  3. Ignore the ones that don't contain login_error

#put mapping

PUT test_stack_login
{
  "mappings": {
    "properties": {
      "context.identity.user_id": {
        "type": "keyword"
      },
      "context.identity.type": {
        "type": "keyword"
      }
    }
  }
}

#put example docs

POST test_stack_login/_bulk?refresh&pretty
{"index":{}}
{"context.identity.user_id":1,"context.identity.type":"login_error"}
{"index":{}}
{"context.identity.user_id":1,"context.identity.type":"login"}
{"index":{}}
{"context.identity.user_id":2,"context.identity.type":"login"}
{"index":{}}
{"context.identity.user_id":3,"context.identity.type":"login"}
{"index":{}}
{"context.identity.user_id":4,"context.identity.type":"login_error"}
{"index":{}}
{"context.identity.user_id":4,"context.identity.type":"login"}

#Run the query

GET test_stack_login/_search
{
  "size": 0,
  "aggs": {
    "NAME": {
      "terms": {
        "field": "context.identity.user_id",
        "size": 1000
      },
      "aggs": {
        "context_identity_type": {
          "terms": {
            "field": "context.identity.type",
            "size": 10
          }
        },
        "login_error_exist": {
          "bucket_selector": {
            "buckets_path": {
              "var1": "context_identity_type['login_error']>_count"
            },
            "script": "params.var1 != null"
          }
        }
      }
    }
  }
}

#the result will be like in the ss enter image description here

You will get user_id containing both login and login_error information in the context.identity.type field. The keys in the response will give you the user_id of which have logged in at least once unsuccessfully and once successfully logged in.

"buckets" : [
  {"key" : "1" ...},
  {"key" : "4" ...}
]

Upvotes: 2

Related Questions