Reputation: 33293
So, I have a graph with millions of nodes and example is
watched director
user1 -------> movie_1 <------ chris nolan
^
user2------------|
watched
and so on..
I want to generate a query to compute number of movies watched by each user?
And then average number of movies watched? how do i do this in gremlin/ cypher
Upvotes: 2
Views: 393
Reputation: 46226
Here's the Gremlin approach...first for movies watched per person (note that this code is written to be run in the Gremlin REPL):
m = [:]
g.E.has('label','watched').groupCount(m){it.outV.next()}.iterate()
The above code shows that we iterate all "watched" edges and group on the out vertex of each "watched" edge (i.e. the user vertex). The group count is stored in the Map
defined as m
.
Now that we have m
we can use that to get the average:
total = m.values().sum()
avg = total / m.size()
Upvotes: 4
Reputation: 67044
Since you asked for either Cypher or Gremlin, below are the Cypher queries.
It was not obvious to me that your data model had any node labels, so here are some queries that only include user nodes that have watched at least 1 movie. This limitation stems from the fact that there is no way to identify that a node without a watched
outgoing relationship is actually a user.
(a) How to get each distinct user and the number of (distinct) movies that s/he watched. (Users who did not watch any movies will not be in the returned collection.)
MATCH (u)-[:watched]->(m)
RETURN u, COUNT(DISTINCT m);
I assumed you did not want to count the same movie twice for the same user (in those cases where the user watched the same movie multiple times).
(b) How to get the average number of (distinct) movies watched by all users (who watched any movies at all):
MATCH (u)-[:watched]->(m)
WITH u, COUNT(DISTINCT m) AS cdm
RETURN avg(cdm);
If you wanted to also include users who did not watch any movies, then you may need to make sure that all user nodes are labelled (say, by the label "User"). In the following queries, I assume that that has been done.
(a) How to get each distinct user and the number of (distinct) movies that s/he watched:
MATCH (u:User)
OPTIONAL MATCH (u)-[:watched]->(m)
RETURN u, COUNT(DISTINCT m);
(b) How to get the average number of (distinct) movies watched by all users:
MATCH (u:User)
OPTIONAL MATCH (u)-[:watched]->(m)
WITH u, COUNT(DISTINCT m) AS cdm
RETURN avg(cdm);
Upvotes: 3