Georg Heiler
Georg Heiler

Reputation: 17676

neo4j single pass over graph but multiple matches

I have a graph in neo4j with vertices of:

person:ID,name,value:int,:LABEL
1,Alice,1,Person
2,Bob,0,Person
3,Charlie,0,Person
4,David,0,Person
5,Esther,0,Person
6,Fanny,0,Person
7,Gabby,0,Person
8,XXXX,1,Person

and edges:

:START_ID,:END_ID,:TYPE
1,2,call
2,3,text
3,2,text
6,3,text
5,6,text
5,4,call
4,1,call
4,5,text
1,5,call
1,8,call
6,8,call
6,8,text
8,6,text
7,1,text

imported into neo4j like:

DATA_DIR_SAMPLE=/data_network/
$NEO4J_HOME/bin/neo4j-admin import --mode=csv \
  --database=graph.db \
  --nodes:Person ${DATA_DIR_SAMPLE}/vertices.csv \
  --relationships ${DATA_DIR_SAMPLE}/edges.csv

which looks like: enter image description here

Now when querying the graph like:

MATCH (source:Person)-[*1]-(destination:Person)
RETURN source.name, source.value, avg(destination.value), 'undir_1_any' as type
UNION ALL
MATCH (source:Person)-[*2]-(destination:Person)
RETURN source.name, source.value, avg(destination.value), 'undir_2_any' as type

one can see that the graph is traversed multiple times, and additionally as I want to obtain a table like:

Vertex | value | type_undir_1_any | type_undir_2_any
Alice  | 1     | 0.2              |  0

an additional aggregation step (pivot/reshape) would be required

enter image description here

In the future, I would like to add the following patterns

Is there a better way to combine the queries?

Upvotes: 1

Views: 97

Answers (1)

stdob--
stdob--

Reputation: 29172

You need to aggregate along the path length, while with a custom function of calculating the average value:

MATCH p = (source:Person)-[*1..2]-(destination:Person)
WITH 
  length(p) as L, source, destination
RETURN 
  source.name as Vertex, 
  source.value as value, 
  1.0 * 
      sum(CASE WHEN L = 1 THEN destination.value ELSE 0 END) / 
      sum(CASE WHEN L = 1 THEN 1 ELSE 0 END) as type_undir_1_any,
  1.0 * 
      sum(CASE WHEN L = 2 THEN destination.value ELSE 0 END) /
      sum(CASE WHEN L = 2 THEN 1 ELSE 0 END) as type_undir_2_any

Or a more elegant version with function from the APOC library to calculate the average on the collection:

MATCH p = (source:Person)-[*1..2]-(destination:Person)
RETURN 
  source.name as Vertex, 
  source.value as value,
  apoc.coll.avg(COLLECT(
    CASE WHEN length(p) = 1 THEN destination.value ELSE NULL END
  )) as type_undir_1_any,
  apoc.coll.avg(COLLECT(
    CASE WHEN length(p) = 2 THEN destination.value ELSE NULL END
  )) as type_undir_2_any

Upvotes: 1

Related Questions