Dmitry Bubnenkov
Dmitry Bubnenkov

Reputation: 9859

How to find duplicates documents?

It's very strange that I did not find answer in documentation and here for a very simple question. How to find duplicated records in collections. For example I need to find duplicated by id for next documents:

{"id": 1, name: "Mike"},
{"id": 2, name: "Jow"},
{"id": 3, name: "Piter"},
{"id": 1, name: "Robert"}

I need to query that will return two documents with same id (id: 1 in my case).

Upvotes: 4

Views: 1425

Answers (1)

David Thomas
David Thomas

Reputation: 2349

Have a look at the COLLECT AQL command, it can return the count of documents that contain duplicate values, such as your id key.

ArangoDB AQL - COLLECT

You can use LET a lot in AQL to help break down a query into smaller steps, and work with the output in future queries.

It may be possible to also collapse it all into one query, but this technique helps break it down.

LET duplicates = (
    FOR d IN myCollection
    COLLECT id = d.id WITH COUNT INTO count
    FILTER count > 1
    RETURN {
        id: id,
        count: count
    }
)

FOR d IN duplicates
FOR m IN myCollection
FILTER d.id == m.id
RETURN m

This will return:

[
  {
    "_key": "416140",
    "_id": "myCollection/416140",
    "_rev": "_au4sAfS--_",
    "id": 1,
    "name": "Mike"
  },
  {
    "_key": "416176",
    "_id": "myCollection/416176",
    "_rev": "_au4sici--_",
    "id": 1,
    "name": "Robert"
  }
]

Upvotes: 5

Related Questions