Lakshman Battini
Lakshman Battini

Reputation: 1912

Elasticsearch: select multiple distinct fields group by id

I have ES index with the documents in the below formats, these documents are parsed from the logs using Fluentd parser and indexed in ES. Records format:

{"id": "id1", "field1": "f1_val", "message": "XXXX", "time": "XXXX"}

{"id": "id1", "field2": "f2_val", "message": "XXXX", "time": "XXXX"}

{"id": "id1", "field3": "f3_val", "field4": "f4_val", "message": "XXXX", "time": "XXXX"}

I want to group by the id field and group the fields together so that I can visualize the data as a table in the Kibana dashboard as below:

{"id": "id1", "field1": "f1_val", "field2": "f2_val", "field3": "f3_val", "field4": "f4_val"}

in Kibana:

    Id     Field1     Field2     Field3     Field4
    id1    f1_val     f2_val     f3_val     f4_val       

How to group by documents the id and select the distinct field values in Elasticsearch. Thanks!

Upvotes: 0

Views: 1201

Answers (2)

Albin
Albin

Reputation: 41

Since Elasticsearch doesn't support joins that well and in extension Kibana aswell I would recommend you to join the documents in your application before putting the documents in the index. If this is not a possibility I would go for a transformation as recommended in:

https://discuss.elastic.co/t/combine-multiple-document-into-one-document-with-limited-fields-merging-of-documents/231758

Using this I could achieve something like this in my dashboard: Result image

Steps to reproduce:

  1. Create log index
    PUT log_index
  1. Add some data
    POST log_index/_doc/ {"id": "1", "field1": "The"}
    
    POST log_index/_doc/ {"id": "1", "field2": "quick"}
    
    POST log_index/_doc/ {"id": "1", "field3": "brown", "field4": "fox"}

    POST log_index/_doc/ {"id": "2", "field1": "jumped"}
    
    POST log_index/_doc/ {"id": "2", "field2": "over"}
    
    POST log_index/_doc/ {"id": "2", "field3": "the"}

    POST log_index/_doc/ {"id": "2", "field4": "lazy"}
  1. Transform the log index into a joined index (I'm pretty sure the scripted metric can be written better. This was the first thing that worked):
    PUT _transform/join_logs
    {
      "source": {
        "index": [ 
          "log_index"
        ]
      },
      "pivot": {
        "group_by": {
          "id.keyword": {
            "terms": {
              "field": "id.keyword"
            }
          }
        },
        "aggregations": { 
          "field1": {
            "scripted_metric": {
              "init_script": "state.docs = []",
              "map_script": "state.docs.add(new HashMap(params['_source']))",
              "combine_script": "for (t in state.docs) { if(t.get('field1') != null){ return t.get('field1')}}  return null",
              "reduce_script": "states"
            }
          },
          "field2": {
            "scripted_metric": {
              "init_script": "state.docs = []",
              "map_script": "state.docs.add(new HashMap(params['_source']))",
              "combine_script": "for (t in state.docs) { if(t.get('field2') != null){ return t.get('field2')}}  return null",
              "reduce_script": "states"
            }
          },
          "field3": {
            "scripted_metric": {
              "init_script": "state.docs = []",
              "map_script": "state.docs.add(new HashMap(params['_source']))",
              "combine_script": "for (t in state.docs) { if(t.get('field3') != null){ return t.get('field3')}}  return null",
              "reduce_script": "states"
            }
          },
          "field4": {
            "scripted_metric": {
              "init_script": "state.docs = []",
              "map_script": "state.docs.add(new HashMap(params['_source']))",
              "combine_script": "for (t in state.docs) { if(t.get('field4') != null){ return t.get('field4')}}  return null",
              "reduce_script": "states"
            }
          }
        }
      },
      "dest": { 
        "index": "joined_index"
      }
    }
  1. Run the transformation
  2. Create index pattern for the join index
  3. Open up in discover and create table. Save it and add it to the dashboard.

My assumptions doing this was that the fields only occur once in each document with specified id. Don't know what happens if fields overlap between the documents.

Upvotes: 0

sjishan
sjishan

Reputation: 3652

{
    "size": 0,
    "aggs": {
        "id_agg": {
            "terms": {
                "field": "id.keyword"
            },
            "aggs": {
                "by_field1": {
                    "terms": {
                        "field": "field1.keyword"
                    }
                },
               "by_field2": {
                    "terms": {
                        "field": "field2.keyword"
                    }
                },
               "by_field3": {
                    "terms": {
                        "field": "field3.keyword"
                    }
                }
            }
        }
    }
}

Upvotes: 0

Related Questions