hsf
hsf

Reputation: 88

Neo4j (Cypher) - Is it possible to use non-implicit aggregation?

My question is fairly straightforward. I've been trying to write a Cypher query which uses an aggregation function - min().

I am trying to obtain the closest node to a particular node using the new Spatial functions offered in Neo4j 3.4. My query currently looks like this:

MATCH (a { agency: "Bus", stop_id: "1234" }), (b { agency: "Train" }) 
WITH distance(a.location, b.location) AS dist, a.stop_id as orig_stop_id, b.stop_id AS dest_stop_id 
RETURN orig_stop_id,min(dist) 

The location property is a point property and this query does actually do what I want it to do, except for one thing: I'd like to also include the dest_stop_id field in the result so that I can actually know which other node corresponds to this minimal distance, however Neo4j seems to aggregate implicitly all fields in the RETURN clause that are not inside an aggregate function and the result is I just get a list of all pairs (orig_stop_id, dest_stop_id) and their distance versus getting just the minimum and the corresponding dest_stop_id. Is there any way to specify which fields should be grouped in the result set?

In SQL, GROUP BY allows you to specify this but I haven't been able to find a similar function in Cypher.

Thanks in advance, please let me know if you need any extra information.

Upvotes: 1

Views: 292

Answers (3)

InverseFalcon
InverseFalcon

Reputation: 30397

Seems like you could skip the aggregation function and just order the distance and take the top:

MATCH (a { agency: "Bus", stop_id: "1234" }), (b { agency: "Train" }) 
WITH distance(a.location, b.location) AS dist, a, b
ORDER BY dist DESC
LIMIT 1
RETURN a.stop_id as orig_stop_id, b.stop_id AS dest_stop_id, dist

As others here have mentioned you really should use labels here (otherwise is this doing all node scans to find your starting points, this is probably the main performance bottleneck of your query), and have indexes in place so you're using index lookups for both a and b.

EDIT

If you need the nearest when you have multiple starting nodes, you can take the head of the collected elements like so:

MATCH (a { agency: "Bus", stop_id: "1234" }), (b { agency: "Train" }) 
WITH distance(a.location, b.location) AS dist, a, b
ORDER BY dist DESC
WITH a, head(collect(b {.stop_id, dist})) as b
RETURN a.stop_id as orig_stop_id, b.stop_id AS dest_stop_id, b.dist as dist

We do need to include dist into the map projection from b, otherwise it would be used as a grouping key along with a.

Alternately you could just collect b instead of the map projection and then recalculate with the distance() function per remaining row.

Upvotes: 1

cybersam
cybersam

Reputation: 66999

This should work:

MATCH (a { agency: "Bus", stop_id: "1234" }), (b { agency: "Train" }) 
RETURN
  a.stop_id AS orig_stop_id,
  REDUCE(
    s = NULL,
    d IN COLLECT({dist: distance(a.location, b.location), sid: b.stop_id}) |
    CASE WHEN s.dist < d.dist THEN s ELSE {dist: d.dist, dest_stop_id: d.sid} END
  ) AS min_data

This query uses REDUCE to get the minimum distance and also the corresponding dest_stop_id at the same time.

The tricky part is that the first time the CASE clause is executed, s will be NULL. Afterwards, s will be a map. The CASE clause handles the special NULL situation by specifically performing the s.dist < d.dist test, which will always evaluate to false if s is NULL -- causing the ELSE clause to be executed in that case, initializing s to be a map.

NOTE: Ideally, you should use the labels for your nodes in your query, so that the query does not have to scan every node in the DB to find each node. Also, you may want to add the appropriate indexes to further speed up the query.

Upvotes: 1

SylvainRoussy
SylvainRoussy

Reputation: 359

You can use COLLECT for aggregation (note this query isn't checked) :

MATCH (a { agency: "Bus", stop_id: "1234" }), (b { agency: "Train" }) 
WITH COLLECT (distance(a.location, b.location)) as distances, a.stop_id as stopId
UNWIND distances as distance
WITH min(distance) as min, stopId 
MATCH (bus { agency: "Bus", stop_id: stopId}), (train{ agency: "Train" }) 
WHERE distance(bus.location, train.location) = distance 
RETURN bus,train, distance

Hope this will help you.

Upvotes: 0

Related Questions