Georg Heiler
Georg Heiler

Reputation: 17676

neo4j multiple match aggregations - single pass over graph?

I have a graph like enter image description here

in neo4j.

CREATE (Alice:Person {id:'a', fraud:1})
CREATE (Bob:Person {id:'b', fraud:0})
CREATE (Charlie:Person {id:'c', fraud:0})
CREATE (David:Person {id:'d', fraud:0})
CREATE (Esther:Person {id:'e', fraud:0})
CREATE (Fanny:Person {id:'f', fraud:0})
CREATE (Gabby:Person {id:'g', fraud:0})
CREATE (Fraudster:Person {id:'h', fraud:1})


CREATE
  (Alice)-[:CALL]->(Bob),
  (Bob)-[:SMS]->(Charlie),
  (Charlie)-[:SMS]->(Bob),
  (Fanny)-[:SMS]->(Charlie),
  (Esther)-[:SMS]->(Fanny),
  (Esther)-[:CALL]->(David),
  (David)-[:CALL]->(Alice),
  (David)-[:SMS]->(Esther),
  (Alice)-[:CALL]->(Esther),
  (Alice)-[:CALL]->(Fanny),
  (Fanny)-[:CALL]->(Fraudster)

neo4j percentage of attribute for social network allows to easily calculate the fraudulence percentage of a social network:

MATCH (:Person)-[:CALL|:SMS]->(f:Person)
WITH TOFLOAT(COUNT(*))/100 AS divisor, COLLECT(f) AS fs
UNWIND fs AS f
WITH divisor, f
WHERE f.fraud = 1
RETURN f, COUNT(*)/divisor AS percentage

How can I modify this to use multiple matches for the different types of relations - but still only require a single pass over the graph? I.e. have something more efficient than simply calling the following 3 statements:

MATCH (:Person)-[:CALL]->(f:Person)
    WITH TOFLOAT(COUNT(*))/100 AS divisor, COLLECT(f) AS fs
    UNWIND fs AS f
    WITH divisor, f
    WHERE f.fraud = 1
    RETURN f, COUNT(*)/divisor AS percentage

MATCH (:Person)-[:SMS]->(f:Person)
    WITH TOFLOAT(COUNT(*))/100 AS divisor, COLLECT(f) AS fs
    UNWIND fs AS f
    WITH divisor, f
    WHERE f.fraud = 1
    RETURN f, COUNT(*)/divisor AS percentage

MATCH (:Person)-[:CALL|:SMS]->(f:Person)
    WITH TOFLOAT(COUNT(*))/100 AS divisor, COLLECT(f) AS fs
    UNWIND fs AS f
    WITH divisor, f
    WHERE f.fraud = 1
    RETURN f, COUNT(*)/divisor AS percentage

But rather have something which returns percentage_total, percentage_sms, percentage_phone ?

Upvotes: 0

Views: 61

Answers (1)

Gabor Szarnyas
Gabor Szarnyas

Reputation: 5047

If you would like to keep the results together, you need to chain the queries using WITH and pass along the f variable for the person. Unfortunately, you also have to keep passing all percentage_* variables in all WITH clauses, so it gets quite difficult to maintain:

MATCH (f:Person)    
OPTIONAL MATCH (:Person)-[:CALL|:SMS]->(f)
    WITH TOFLOAT(COUNT(*))/100 AS divisor, COLLECT(f) AS fs
    UNWIND fs AS f
    WITH divisor, f
    WHERE f.fraud = 1
    WITH f, COUNT(*)/divisor AS percentage_all

OPTIONAL MATCH (:Person)-[:CALL]->(f)
    WITH TOFLOAT(COUNT(*))/100 AS divisor, COLLECT(f) AS fs, percentage_all
    UNWIND fs AS f
    WITH divisor, f, percentage_all
    WHERE f.fraud = 1
    WITH f, percentage_all, COUNT(*)/divisor AS percentage_phone

OPTIONAL MATCH (:Person)-[:SMS]->(f)
    WITH TOFLOAT(COUNT(*))/100 AS divisor, COLLECT(f) AS fs, percentage_all, percentage_phone
    UNWIND fs AS f
    WITH divisor, f, percentage_all, percentage_phone
    WHERE f.fraud = 1
    RETURN f, percentage_all, percentage_phone, COUNT(*)/divisor AS percentage_sms

The openCypher project proposed nested subqueries, but this will take some time to make to Neo4j.

Upvotes: 2

Related Questions