Greg Warren
Greg Warren

Reputation: 41

Neo4j Performing an Avg operation on same properties across nodes linked together

I have a Neo4j Database which follows this form of model:

One central node for the subject. All Current Medical Data joined to it with a Current Relationship as separate nodes. If a specific node contains a medical test score and the subject has had that test before it will be joined to the score with a Previous relationship.

e.g:

(subject)<-[:Current]-(current_medical_score)<-[:Previous*]<-(previous_medical_score)

What I am trying to do is to calculate the average of this medical test and return all subject ids where the average is less than a determined cut-off point. However my request seems to only consider the value in the current_medical_score node when it performs the average operation.

MATCH (subject)<-[:Current|:Previous*]-(scores)
WHERE scores.MISD IS NOT NULL AND scores.MISD <> 999
WITH scores, avg(scores.MISD) as sumMis
WHERE sumMis < 4
RETURN scores.SID, sumMis;

+---------------------+
| scores.SID | sumMis |
+---------------------+
| "330"      | 2.0    |
| "2445"     | 2.0    |
+---------------------+

I check whether the scores MISD isn't Null or 999 as some bodies which provide data utilise 999 to signify missing data. Subject 2445 has a current MIS score of 2 and one Previous score of 8 so the expected average should be 5.0

Not 2.0

I can't quite figure out the syntax required to go through variable number of relationships and calculate the average between the values in all the relevant nodes.

Any help would be greatly appreciated.

Upvotes: 2

Views: 161

Answers (2)

InverseFalcon
InverseFalcon

Reputation: 30397

The reason you're seeing these results is because of this:

WITH scores, avg(scores.MISD) as sumMis

Aggregation functions, like avg(), only have meaning with respect to the non-aggregation columns, which form the grouping key.

In this case, this line reads: "For each individual score (since scores here isn't the collection of scores, it's a single score, one score per record/row), find the average of that single score's MISD property". The resulting average per row is just the MISD property, since the average of a single value is the same value.

So the result of this clause is actually two rows, one for each :Score node, with MISD values of 2.0 and 8.0. Then your WHERE clause removes the row with the value of 8 (since it's >= 2), leaving you with sumMis of 2.0.

To ensure the avg() function executes for all values (instead of getting the avg of each :Score node individually, which is useless), you need to either aggregate all the scores at the same point you do the avg(), or remove scores from scope at that time.

As in Bruno's answer, removing scores from scope when you perform the avg() works. However, if each directly connected :Score node is a separate test type, and if you have multiple :Score nodes directly connected to the :Subject by :Current relationships, that query will take the average of all of them, which may not be what you want.

Ideally you would have some property for the test type, which would be the same for all the nodes in the :Previous* chain (but different from the other :Score nodes directly connected by :Current relationships to the subject) and you could use that as the grouping key:

WITH scores.testName as testName, avg(scores.MISD) as sumMis
WHERE sumMis < 4
RETURN subject.ID, testName, sumMis

Upvotes: 1

Bruno Peres
Bruno Peres

Reputation: 16365

Considering this sample data set:

create (subject:Subject {ID:"2442"})
    <-[:Current]-(:Score {MISD:2.0, SID: "2442"})
    <-[:Previous]-(:Score {MISD:8.0, SID: "2442"})

You problem is you are grouping results by score variable. You should group by subject instead. That is: WITH subject, (...) instead of WITH scores, (...)

MATCH (subject:Subject)<-[:Current|:Previous*]-(scores)
WHERE scores.MISD IS NOT NULL AND scores.MISD <> 999
WITH subject, avg(scores.MISD) as sumMis
RETURN subject.ID, sumMis;

The output:

╒════════════╤════════╕
│"subject.ID"│"sumMis"│
╞════════════╪════════╡
│"2442"      │5       │
└────────────┴────────┘

Upvotes: 1

Related Questions