Reputation: 1462
I have an ArangoDB with two collections, one for Documents, and one for Edges, I'd like an efficient query capable of only returning the Edges that share the same _to
and _from
values.
For example, assume my Documents Collection contains A, B, C, and D. Let's also assume my Edges Collection contains X, Y, and Z.
Edge X is _from
A and _to
B.
Edge Y is _from
A and _to
B as well.
Edge Z is _from
C and _to
D.
However, I do not know that X and Y are basically the same Edge, and I don't know that the Documents that share similar Edges are A & B. My query is to seek out these duplicate Edges.
What I have so far looks like this:
FOR ec1 IN edge_collection
FOR ec2 IN edge_collection
FILTER ec1._key != ec2._key AND ec1._to == ec2._to AND ec1._from == ec2._from
RETURN ec1
This seems to work, though it also feels terribly inefficient. Is there a better way to go about doing this, or is this the best possible solution with AQL?
Upvotes: 1
Views: 1602
Reputation: 1462
After some more digging, I have found a significantly faster methodology using the COLLECT statement.
Also, full disclosure, this was me building off of this other answer.
LET duplicates = (
FOR ec IN edge_collection
COLLECT from = ec._from, to = ec._to WITH COUNT INTO count
FILTER count > 1
RETURN {
from: from,
to: to,
count: count
}
)
FOR d IN duplicates
FOR ec IN edge_collection
FILTER d.from == ec._from AND d.to == ec._to
RETURN ec
EDIT:
Building on @CodeManX's answer, my end goal was to be able to delete or rather REMOVE these duplicate values. If someone stumbles upon this with a similar goal, this might be helpful:
LET duplicates = (
FOR ec IN edge_collection
COLLECT from = ec._from, to = ec._to AGGREGATE count = LENGTH(1) INTO edges = ec._key
FILTER count > 1
RETURN { from, to, count, edges }
)
FOR d IN duplicates
LET key_to_delete = FIRST(d.edges)
REMOVE { _key: key_to_delete } IN edge_collection
Upvotes: 1
Reputation: 11915
You can group by _from
and _to
, count how many edges there are per group, and filter out the unique combinations:
FOR ec IN edge_collection
COLLECT from = ec._from, to = ec._to WITH COUNT INTO count
FILTER count > 1
RETURN { from, to, count }
Or if you want to return the edge keys as well:
FOR ec IN edge_collection
COLLECT from = ec._from, to = ec._to INTO edges = ec._key
LET count = LENGTH(edges)
FILTER count > 1
RETURN { from, to, count, edges }
Alternatively using an aggregation instead of a post-calculation:
FOR ec IN edge_collection
COLLECT from = ec._from, to = ec._to AGGREGATE count = LENGTH(1) INTO edges = ec._key
FILTER count > 1
RETURN { from, to, count, edges }
To return the full edges use INTO edges = ec
instead. You could also use just INTO edges
but then each edge will be nested in an object {"ec": … }
.
Upvotes: 2