How to find the most occurring attribute values count in documents of a ArangoDB collection?

Question

I have a collection in ArangoDB where each document contains some attributes like

{
    "contributor_name": "Rizano",
    "action": "create",
    "id": 3633,
    "type": "newusers",
    "logtitle": "What to do",
    "timestamp": "2006-07-05",
    "contributor_id": 7878
}

The collection contains millions of documents. Now I want to find out which contributor_name is most occurring in the documents and their count.

CodeManX · Accepted Answer

You can simply group by contributor_name and use the special COLLECT syntax variant WITH COUNT INTO ... to efficient compute how often each value occurs in the dataset:

FOR doc IN coll
  COLLECT name = doc.contributor_name WITH COUNT INTO count
  RETURN { name, count }

The result may look like this:

[
  { "name": "Rizano", "count": 5 },
  { "name": "Felipe", "count": 8 },
  ...
]

You may merge the result together like this if you prefer that format:

[
  {
    "Rizano": 5,
    "Felipe": 8
  }
  ...
]

Query:

RETURN MERGE(
  FOR doc IN coll
    COLLECT name = doc.contributor_name WITH COUNT INTO count
    RETURN { [name]: count }
)

You may also sort by count and limit the result to the most occurring values, e.g. like this (only top contributor):

FOR doc IN coll
  COLLECT name = doc.contributor_name WITH COUNT INTO count
  SORT count DESC
  LIMIT 1
  RETURN { name, count }

There's also COLLECT AGGREGATE, although there should be no difference in performance for this particular query:

FOR doc IN coll
  COLLECT name = doc.contributor_name AGGREGATE count = LENGTH(1)
  SORT count DESC
  LIMIT 1
  RETURN { name, count }

The value passed to LENGTH doesn't really matter, all we want is that it returns a length of 1 (thus increasing the counter by 1 for the given contributor).

How to find the most occurring attribute values count in documents of a ArangoDB collection?

Answers (1)

Related Questions