Filtering Arango query results to report an array intersection in AQL

Question

I am experimenting with ArangoDB as a replacement for Postgres which I am currently using. In Postgres I have a table, zuffs containing rows bearing the following form

 hash       passes     visits   

 123        {1,2,4}   {2,3,4,5}

where both passes and visits are int[]. In order to establish the intersection of passes and visits` I would write

SELECT ARRAY
(
 SELECT UNNEST(a1) INTERSECT SELECT UNNEST(a2))
 FROM (SELECT passes AS a1,visits as a2 FROM zuffs where hash = 1) q;

which Postgres obligingly executes to return the result {2,4}.

Now suppose I have the collection zuffs in ArangoDB with the following documents

{"hash":45,"passes":[1,2,3],"visits":[3,11,17]}

{"hash":76,"passes":[11,2],"visits":[3,4,17]}

{"hash":13,"passes":[11,21],"visits":[13,44,27]}

{"hash":7,"passes":[2],"visits":[4,67]}

It is not clear to me how I would do the following

Establish the, intersection, I1 of passes and visits for the document bearing the hash 45.
Take that result and return the hashes for other documents in the same collection that have a non-empty intersection with the the intersection I2 obtained above.

While there is much I like about ArangoDB I find it unfortunate that it has its own query language instead of just using the required superset of SQL. In this instance I have figured out that I will somehow have to use FOR IN along with FILTER but it is not at all clear to me how.

kerry · Accepted Answer

Being a SQL expert, I also fought against the syntax differences of AQL. However, it really did not end up being that hard to understand, which made "learning" just a function of time and use.

I'm sure there are other ways to do this, but here's a quick/dirty example:

Find the document you want to match, and get it's intersection product
For each document z in zuffs, calculate the intersection and look for matches
Return the hash of matching documents

LET match = (
    FOR z IN zuffs
        FILTER z.hash == 45
        FOR i IN INTERSECTION(z.passes, z.visits)
            RETURN i
)
FOR z IN zuffs
    LET i = INTERSECTION(z.passes, z.visits)
    FOR m IN match
        FILTER m IN i
            RETURN z.hash

Returns:

[
  45
]

Given your example data set, you will only get back one document with meeting the requirements (45). Adding more documents or modifying one of the other documents to have a common intersection will provide more interesting results.

Things to keep in mind:

Moving from SQL to AQL makes you shift your thinking from "sets" (of rows) to "loops" (over things).
Keep track of the return (or native) types (array vs object/string/etc.), and treat each piece of data accordingly (note the FOR i IN INTERSECTION... in the match).
Use the Explain and Profile buttons (or functions) to see how your query is performing. Think of other ways the result can be accomplished, and try to make it faster!

For instance:

LET match = FIRST(
    FOR z IN zuffs
        FILTER z.hash == 45
        RETURN INTERSECTION(z.passes, z.visits)
)
FOR z IN zuffs
    FILTER LENGTH( INTERSECTION(match, INTERSECTION(z.passes, z.visits)) ) > 0
        RETURN z.hash

The match sections of both examples return the same result (an array with a single value of 3), but they do it in different ways. And instead of doing a FOR m IN match..., I could have used native array functions with a filter. In reality the first example is much quicker than the second, and the reason is evident in the "explain" plan.

I found it immensely helpful become familiar with the high level and function documentation. Those two places will have almost everything you need to succeed with AQL (aside from "big picture" stuff like query tuning, indexing, etc.).

Filtering Arango query results to report an array intersection in AQL

Answers (1)

Related Questions