alexanoid
alexanoid

Reputation: 25770

Neo4j Cypher group by a column in a list of rows for aggregation

I have the following Neo4j Cypher query:

MATCH (v:Vacancy {deleted: false})-[vv:HAS_VOTE_ON]->(c:Criterion)<-[vp:HAS_VOTE_ON]-(p:Profile {id: 703, deleted: false})
WHERE vv.avgVotesWeight > 0 AND vv.avgVotesWeight <= vp.avgVotesWeight
WITH v, p
MATCH (v)-[vv1:HAS_VOTE_ON]->(cv:Criterion)
OPTIONAL MATCH (p)-[vp1:HAS_VOTE_ON]->(cv)
WITH v.id as vacancyId, cv.id as criterionId, coalesce(vv1.`properties.skillCoefficient`, 1.0) as vacancyCriterionCoefficient, coalesce(vp1.avgVotesWeight, 0) as profileCriterionVoteWeight, coalesce(vp1.totalVotes, 0) as profileCriterionTotalVotes
RETURN vacancyId, criterionId, vacancyCriterionCoefficient, profileCriterionVoteWeight, profileCriterionTotalVotes

which returns the following values:

enter image description here

Now, for each Vacancy (with the same vacancyId) I need to calculate totalProfileCriterionVoteWeight (SUM) for all criteria by the folowing formula:

vacancyCriterionCoefficient * profileCriterionVoteWeight

For this purpose, I need to group somehow the rows by vacancyId.

Could you please show how it is possible with a Cypher here?

Upvotes: 2

Views: 349

Answers (1)

nimrod serok
nimrod serok

Reputation: 16033

You can replace your last line with:

WITH distinct(vacancyId) as vacancyId, sum(vacancyCriterionCoefficient * profileCriterionVoteWeight) as totalProfileCriterionVoteWeight 
RETURN vacancyId, totalProfileCriterionVoteWeight

Which For the data shown in the picture will return:

╒═══════════╤═════════════════════════════════╕
│"vacancyId"│"totalProfileCriterionVoteWeight"│
╞═══════════╪═════════════════════════════════╡
│704        │22                               │
├───────────┼─────────────────────────────────┤
│706        │16                               │
└───────────┴─────────────────────────────────┘

Explanation: distinct allows to "group" the rows, only with an "accumulator" to other fields. Here we just needed to use SUM as an accumulator.

In order to test it, I used sample data:

MERGE (a:Node{vacancyId:704, criterionId: 6907, vacancyCriterionCoefficient: 1, profileCriterionVoteWeight: 1, profileCriterionTotalVotes: 1})  
MERGE (b:Node{vacancyId:704, criterionId: 6909, vacancyCriterionCoefficient: 3, profileCriterionVoteWeight: 5, profileCriterionTotalVotes: 1})  
MERGE (c:Node{vacancyId:704, criterionId: 6908, vacancyCriterionCoefficient: 2, profileCriterionVoteWeight: 3, profileCriterionTotalVotes: 1})  
MERGE (d:Node{vacancyId:706, criterionId: 6909, vacancyCriterionCoefficient: 1, profileCriterionVoteWeight: 5, profileCriterionTotalVotes: 1})  
MERGE (e:Node{vacancyId:706, criterionId: 6908, vacancyCriterionCoefficient: 3, profileCriterionVoteWeight: 3, profileCriterionTotalVotes: 1})  
MERGE (f:Node{vacancyId:706, criterionId: 6907, vacancyCriterionCoefficient: 2, profileCriterionVoteWeight: 1, profileCriterionTotalVotes: 1})  

And query:

MATCH (n)
WITH n.vacancyId as vacancyId, n.criterionId as criterionId, n.vacancyCriterionCoefficient as vacancyCriterionCoefficient, n.profileCriterionVoteWeight as profileCriterionVoteWeight, n.profileCriterionTotalVotes as profileCriterionTotalVotes
WITH distinct(vacancyId) as vacancyId, sum(vacancyCriterionCoefficient * profileCriterionVoteWeight) as totalProfileCriterionVoteWeight 
//return vacancyId, criterionId, vacancyCriterionCoefficient, profileCriterionVoteWeight, profileCriterionTotalVotes
RETURN vacancyId, totalProfileCriterionVoteWeight

Which provide the results above

Upvotes: 2

Related Questions