Reputation: 77
I have people and departments with the following possible relationships
(p:Person)-[:WORKS_IN]->(d:Department)
(employee:Person)-[:REPORTS_TO]->(manager:Person)
(child:Department)-[:UNIT_OF]->(parent:Department)
I didn't get the information to identify the manager of a particular department, but within any department, the person with the most incoming [:REPORTS_TO] relationships is likely the boss. I'm trying to get a list of all the departments matched with the most "reported to" person in that department, but I'm failing terribly.
To make it more complex, there are "non-departmental" subgroups within the departments, defined by the fact that a person may report to someone in the department who's not the department's manager.
I'm looking for a way to output a list of all the department ids with the id of the most reported-to employee in that department. One row per department is ideal.
Thanks
Upvotes: 0
Views: 313
Reputation: 11216
I think something like this might get you started.
Find all of the Person
entries with REPORTS_TO
relationships and then find which department they work in. Order the result set by department and highest reports_to. Aggregate the results per department and then return only the first one.
MATCH (:Person)-[:REPORTS_TO]->(manager:Person)
WITH manager
//
MATCH (manager)-[:WORKS_IN]->(d:Department)
WITH d.name AS dept, manager.name AS manager, count(*) AS size
ORDER BY dept, size DESC
//
WITH dept, collect([manager,size]) AS managers
RETURN dept, managers[0][0] AS manager, managers[0][1] AS size
ORDER BY size DESC, dept
Upvotes: 2
Reputation: 30417
Dave's answer is quite good, but it misses cases where managers in a department tie for being reported to the most. If ties don't matter to you, stick with Dave's answer.
This query has a couple extra steps, but should return the manager in each department with the highest number of reports, including those who tie.
// first find managers with people reporting to them
MATCH (manager:Person)-[:WORKS_IN]->(d:Department)
WHERE EXISTS((:Person)-[:REPORTS_TO]->(manager))
// get the number of reporters for each manager, then find the max for all managers in each department per department
WITH d, manager, size( (:Person)-[:REPORTS_TO]->(manager) ) as reportSize
WITH d, max(reportSize) as maxReports
// find all managers in the department with the department's max report size
MATCH (manager:Person)-[:WORKS_IN]->(d)
WHERE size( (:Person)-[:REPORTS_TO]->(manager) ) = maxReports
RETURN d, COLLECT(manager) as topManager, maxReports
This is returning nodes for department and managers, but you can easily change this to output ids or names from their properties instead.
Aggregations in Cypher are grouped by the remaining non-aggregation columns, which is why we remove manager
from scope when we calculate maxReports
; that ensures that aggregation is done per department, not per manager and department (which would be the same as the reports per manager).
There may be a way to optimize using Neo4j 3.1's upcoming changes (still in development as of early September 2016), they do have a pattern comprehension feature that works like a subquery that could be very useful.
Upvotes: 5
Reputation: 445
You shouldn't need any collect statements, just a single count for # of relationships.
MATCH (worker:Person)-[REPORTS_TO]->(manager:Person),
(manager:Person)-[WORKS_IN]->(department:Department)
WITH manager.name as managerName,
count(*) AS reportCount,
department.name as departmentName
WHERE reportCount > 0
RETURN managerName, reportCount, departmentName
This query will count all REPORTS_TO relationships to a manager and return the manager's name, the number of reports and the department.
If you need an ORDER BY or GROUP BY statement you can add those to rank # of reports per-manager, per-department.
Upvotes: 2