writofmandamus
writofmandamus

Reputation: 1231

Significant terms aggregation for arrays in elasticsearch

I am unable to perform a significant terms aggregation using a field that is an array. My Javascript query looks like this:

client.search({
  index: myIndex,
  body: {
    query: {
      terms: {
        myField: ['someuserid']
        // also tried with same result... myField: 'someuserid'
      }
    },
    aggregations: {
      recommendations: {
        significant_terms: {
          field: "myField",
          min_doc_count: 1
        }
      }
    }
  }
})

I get this error:

(node:13105) UnhandledPromiseRejectionWarning: Unhandled promise rejection 
(rejection id: 1): Error: [illegal_argument_exception] Fielddata is disabled 
on text fields by default. Set fielddata=true on [myField] in order to 
load fielddata in memory by uninverting the inverted index. Note that this can 
however use significant memory.

My mapping looks like this:

{
  index: 'myIndex',
  type: 'users',
  body: {
    properties: {
        'myField': []
    }
  }
}

I know that I don't need to explicitly map array data types, but I do it so I can easily see what fields I have for a certain type. Following the error message I would change my mapping to look like this:

...
properties: {
  myField: {
    fielddata: "true"
  }
}
...

However, this results in this error:

Error: [mapper_parsing_exception] No type specified for field [myField]

If I were to then add a type: ... properties: { myField: { type: [], fielddata: "true" } } ... I would get this error:

[mapper_parsing_exception] No handler for type [[]] declared on field [myField]

Currently, the data I am aggregating is from data that is seeded through the Javascript client library completely using the Update API constructed with this:

const update = {
    "upsert": {
      "myField": ['myValue']
    },
    "script": {
    "inline": "ctx._source.myField.add(params.itemField)",
    "params": {
      "itemField": 'itemValue'
    }
  }
};

const req = {
    index: 'myIndex',
    type: 'users',
    id: 'someuserid',
    body: update
}

Hits from this query curl -XGET 'localhost:9200/myIndex/users/_search?pretty' would then look like this:

...
{
    "_index" : "myIndex",
    "_type" : "users",
    "_id" : "someuserid",
    "_score" : 1.0,
    "_source" : {
      "myField" : [
        "someFieldId1",
        "someFieldId1",
        "someFieldId2"
      ]
    }
  },
...

How can I properly perform a significant terms aggregation using a field that is an array?

Upvotes: 0

Views: 404

Answers (1)

deathyr
deathyr

Reputation: 423

https://www.elastic.co/guide/en/elasticsearch/reference/current/array.html

In Elasticsearch, there is no dedicated array type. Any field can contain zero or more values by default, however, all values in the array must be of the same datatype.

Assuming you are using ElasticSearch 5.x, try to change type: [] to type: "text" or type: "keyword"

For the difference between the two I would recommend reading this: https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html

However in your case, since it looks like some kind of id, it likely doesn't need to be analyzed, so I would suggest "keyword" instead of "text".

For previous versions of ES, use "string" instead. https://www.elastic.co/guide/en/elasticsearch/reference/2.4/string.html

Upvotes: 1

Related Questions