geotheory
geotheory

Reputation: 23630

Neo4j cypher: combining aggregate, order by and limit

Relatively new to Cypher and can't see from documentation how to get what I need, which is a combination of ORDER BY and LIMIT with an aggregation, I'm guessing using COLLECT.

Take the following graph:

CREATE (colombia {name:'Colombia', pop:50})
CREATE (argentina {name:'Argentina', pop:43})
CREATE (peru {name:'Peru', pop:32})
CREATE (chile {name:'Chile', pop:17})
CREATE (ecuador {name:'Ecuador', pop:16})
CREATE (bolivia {name:'Bolivia', pop:11})
CREATE (paraguay {name:'Paraguay', pop:7})
CREATE (uruguay {name:'Uruguay', pop:3.4})
CREATE (venezuela {name:'Venezuela', pop:32})
CREATE
  (colombia)-[:BORDERS {length:586}]->(ecuador),
  (colombia)-[:BORDERS {length:2219}]->(venezuela),
  (colombia)-[:BORDERS {length:1800}]->(peru),
  (ecuador)-[:BORDERS {length:1420}]->(peru),
  (peru)-[:BORDERS {length:171}]->(chile),
  (peru)-[:BORDERS {length:1075}]->(bolivia),
  (chile)-[:BORDERS {length:6691}]->(argentina),
  (argentina)-[:BORDERS {length:942}]->(bolivia),
  (argentina)-[:BORDERS {length:1880}]->(paraguay),
  (argentina)-[:BORDERS {length:541}]->(uruguay),
  (bolivia)-[:BORDERS {length:750}]->(paraguay)

To illustrate what I need consider the query:

MATCH (a)-[ab]-(b)-[bc]-(c) 
WHERE a.name = 'Bolivia'
WITH a, b, c, bc
ORDER BY b.pop DESC, bc.length DESC
RETURN a.name, b.name, b.pop, c.name, bc.length

╒═════════╤═══════════╤═══════╤═══════════╤═══════════╕
│"a.name" │"b.name"   │"b.pop"│"c.name"   │"bc.length"│
╞═════════╪═══════════╪═══════╪═══════════╪═══════════╡
│"Bolivia"│"Argentina"│43     │"Chile"    │6691       │
├─────────┼───────────┼───────┼───────────┼───────────┤
│"Bolivia"│"Argentina"│43     │"Paraguay" │1880       │
├─────────┼───────────┼───────┼───────────┼───────────┤
│"Bolivia"│"Argentina"│43     │"Uruguay"  │541        │
├─────────┼───────────┼───────┼───────────┼───────────┤
│"Bolivia"│"Peru"     │32     │"Colombia" │1800       │
├─────────┼───────────┼───────┼───────────┼───────────┤
│"Bolivia"│"Peru"     │32     │"Ecuador"  │1420       │
├─────────┼───────────┼───────┼───────────┼───────────┤
│"Bolivia"│"Peru"     │32     │"Chile"    │171        │
├─────────┼───────────┼───────┼───────────┼───────────┤
│"Bolivia"│"Paraguay" │7      │"Argentina"│1880       │
└─────────┴───────────┴───────┴───────────┴───────────┘

I need to limit the results firstly to the top 2 matching (b) nodes by pop attribute, then the top 2 (c) nodes by [bc.length] for each case of (b). In this example that means rows 1,2,4 & 5.

I've experimented with COLLECT but got nowhere. Very grateful for assistance.

Upvotes: 1

Views: 337

Answers (1)

cybersam
cybersam

Reputation: 66957

This query:

MATCH (a)--(b)-[bc]-(c) 
WHERE a.name = 'Bolivia'
WITH a, b, c, bc
ORDER BY b.pop DESC, bc.length DESC
WITH a, b, COLLECT({c: c, bc: bc}) AS data
LIMIT 2
UNWIND data[..2] AS d
RETURN a.name, b.name, b.pop, d.c.name, d.bc.length

returns:

╒═════════╤═══════════╤═══════╤══════════╤═════════════╕
│"a.name" │"b.name"   │"b.pop"│"d.c.name"│"d.bc.length"│
╞═════════╪═══════════╪═══════╪══════════╪═════════════╡
│"Bolivia"│"Argentina"│43     │"Chile"   │6691         │
├─────────┼───────────┼───────┼──────────┼─────────────┤
│"Bolivia"│"Argentina"│43     │"Paraguay"│1880         │
├─────────┼───────────┼───────┼──────────┼─────────────┤
│"Bolivia"│"Peru"     │32     │"Colombia"│1800         │
├─────────┼───────────┼───────┼──────────┼─────────────┤
│"Bolivia"│"Peru"     │32     │"Ecuador" │1420         │
└─────────┴───────────┴───────┴──────────┴─────────────┘

COLLECT is an aggregating function that uses the non-aggregating terms in the same WITH (or RETURN) clause as the "grouping key". In this query, COLLECT is collecting an ordered list of c and bc values for each distinct (and ordered) pair of a and b values. The LIMIT clause limits the WITH results to the first two a and b pairs (and their data lists).

Upvotes: 2

Related Questions