elasticsearch - comprehensive list of distinct values

Question

I want to find all distinct values of a term over a time range.

Example data:

[
  {
    'a': 123,
    'b': 456,
    'user': 'bob',
  },
  {
    ...
    'user': 'sally',
    ...
  },
  {
    ...
    'user': 'bob',
    ...
  },
  {
    'x': 2,
    'y': 3,
  }
]

In this case I want to find all distinct values of user.

Note that some users will appear in multiple records, and not all records have a user.

Also, my requirement is that the list of returned users MUST be comprehensive (ie. if there exists a record with a certain user, then that user MUST appear in the list of results).

Having the number of occurrences of each user would be nice too, but not required.

I considered Cardinality Aggregations but I'm concerned about the 'approximate' nature of the results. I need a comprehensive list of users.

How can I do this in elasticsearch?

NikoNyrh · Accepted Answer

As mentioned in comments terms aggregation is the one you are looking for. Results are approximate only if you query N most common terms and data is split in multiple shards.

You can set size to zero to get "unlimited" (Integer.MAX_VALUE) results.

elasticsearch - comprehensive list of distinct values

Answers (1)

Related Questions