d8aninja
d8aninja

Reputation: 3633

How do I get the length of an array using an elasticsearch query in the ELK stack?

I am using Kibana and have an index that looks like this

GET index_name/

{
  "index_name": {
    "aliases": {},
    "mappings": {
      "json": {
        "properties": {
          "scores": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }

I would like to get the length of the scores array (ie, how many text elements it has in it) for each record, with the end goal of filtering out records whose length is greater than or equal to 20. So far, I'm able to identify (highlight) each of the records that IS "20" but can't seem to build a filter that I could then turn into a boolean value (1 for true) for later use / summing records that satisfy the condition. I am putting this into the Discover Panel's filter, after clicking on 'Edit Query DSL':

{
  "query": {
    "match": {
      "scores": {
        "query": "20",
        "type": "phrase"
      }
    }
  }
}

EDIT: an example of this field in the document is:

scores:12, 12, 12, 20, 20, 20

In the table tab view, it has a t next to it, signifying text. The length of this field varies anywhere from 1 to over 20 items from record to record. I also don't know how to get the length of this field (only) returned to me with a query, but I have seen some other answers that suggest something like this (which produces an error for me):

"filter" : {
    "script" : {
        "script" : "doc['score'].values.length > 10"
    }
}

Upvotes: 1

Views: 7239

Answers (1)

Benjamin Trent
Benjamin Trent

Reputation: 7566

There are a couple of options

This is to find where the number of items of any size (separated via ,).

{
  "query": {
    "bool": {
      "filter": {
        "script": {
          "script": {
            "source":"/, /.split(doc['score.keyword'].value).length > 20"
          }
        }
      }
    }
  }
}

NOTE: for the above solution setting script.painless.regex.enabled: true in elasticsearch.yml is required.

If all the scores are of a specific size (i.e. all just two digits), a string length (as you were attempting) would work:

{
  "query": {
    "bool": {
      "filter": {
        "script": {
          "script": {
            "source":"doc['scores.keyword'].value.length() > 78"
          }
        }
      }
    }
  }
}

I chose 78 as each item (assuming 2 digits) is 2 digits + ,== 4, you want to see larger than 20, that is 19 * 4 + 2.

If you are concerned about the size of this array of scores often, you should probably store it as such. You can do processing in your ingest pipeline with the split processor to achieve this.

Upvotes: 1

Related Questions