Reputation: 88
My question is fairly straightforward. I've been trying to write a Cypher query which uses an aggregation function - min()
.
I am trying to obtain the closest node to a particular node using the new Spatial functions offered in Neo4j 3.4. My query currently looks like this:
MATCH (a { agency: "Bus", stop_id: "1234" }), (b { agency: "Train" })
WITH distance(a.location, b.location) AS dist, a.stop_id as orig_stop_id, b.stop_id AS dest_stop_id
RETURN orig_stop_id,min(dist)
The location
property is a point
property and this query does actually do what I want it to do, except for one thing: I'd like to also include the dest_stop_id
field in the result so that I can actually know which other node corresponds to this minimal distance, however Neo4j seems to aggregate implicitly all fields in the RETURN
clause that are not inside an aggregate function and the result is I just get a list of all pairs (orig_stop_id
, dest_stop_id
) and their distance versus getting just the minimum and the corresponding dest_stop_id
. Is there any way to specify which fields should be grouped in the result set?
In SQL, GROUP BY
allows you to specify this but I haven't been able to find a similar function in Cypher.
Thanks in advance, please let me know if you need any extra information.
Upvotes: 1
Views: 292
Reputation: 30397
Seems like you could skip the aggregation function and just order the distance and take the top:
MATCH (a { agency: "Bus", stop_id: "1234" }), (b { agency: "Train" })
WITH distance(a.location, b.location) AS dist, a, b
ORDER BY dist DESC
LIMIT 1
RETURN a.stop_id as orig_stop_id, b.stop_id AS dest_stop_id, dist
As others here have mentioned you really should use labels here (otherwise is this doing all node scans to find your starting points, this is probably the main performance bottleneck of your query), and have indexes in place so you're using index lookups for both a and b.
EDIT
If you need the nearest when you have multiple starting nodes, you can take the head of the collected elements like so:
MATCH (a { agency: "Bus", stop_id: "1234" }), (b { agency: "Train" })
WITH distance(a.location, b.location) AS dist, a, b
ORDER BY dist DESC
WITH a, head(collect(b {.stop_id, dist})) as b
RETURN a.stop_id as orig_stop_id, b.stop_id AS dest_stop_id, b.dist as dist
We do need to include dist
into the map projection from b
, otherwise it would be used as a grouping key along with a
.
Alternately you could just collect b
instead of the map projection and then recalculate with the distance()
function per remaining row.
Upvotes: 1
Reputation: 66999
This should work:
MATCH (a { agency: "Bus", stop_id: "1234" }), (b { agency: "Train" })
RETURN
a.stop_id AS orig_stop_id,
REDUCE(
s = NULL,
d IN COLLECT({dist: distance(a.location, b.location), sid: b.stop_id}) |
CASE WHEN s.dist < d.dist THEN s ELSE {dist: d.dist, dest_stop_id: d.sid} END
) AS min_data
This query uses REDUCE
to get the minimum distance and also the corresponding dest_stop_id
at the same time.
The tricky part is that the first time the CASE
clause is executed, s
will be NULL
. Afterwards, s
will be a map. The CASE
clause handles the special NULL
situation by specifically performing the s.dist < d.dist
test, which will always evaluate to false
if s
is NULL
-- causing the ELSE
clause to be executed in that case, initializing s
to be a map.
NOTE: Ideally, you should use the labels for your nodes in your query, so that the query does not have to scan every node in the DB to find each node. Also, you may want to add the appropriate indexes to further speed up the query.
Upvotes: 1
Reputation: 359
You can use COLLECT for aggregation (note this query isn't checked) :
MATCH (a { agency: "Bus", stop_id: "1234" }), (b { agency: "Train" })
WITH COLLECT (distance(a.location, b.location)) as distances, a.stop_id as stopId
UNWIND distances as distance
WITH min(distance) as min, stopId
MATCH (bus { agency: "Bus", stop_id: stopId}), (train{ agency: "Train" })
WHERE distance(bus.location, train.location) = distance
RETURN bus,train, distance
Hope this will help you.
Upvotes: 0