frazman
frazman

Reputation: 33293

Querying whole graph using gremlin or cypher

So, I have a graph with millions of nodes and example is

      watched           director
user1 -------> movie_1 <------ chris nolan
                 ^
user2------------|  
      watched

and so on..

I want to generate a query to compute number of movies watched by each user?

And then average number of movies watched? how do i do this in gremlin/ cypher

Upvotes: 2

Views: 393

Answers (2)

stephen mallette
stephen mallette

Reputation: 46226

Here's the Gremlin approach...first for movies watched per person (note that this code is written to be run in the Gremlin REPL):

m = [:]
g.E.has('label','watched').groupCount(m){it.outV.next()}.iterate()

The above code shows that we iterate all "watched" edges and group on the out vertex of each "watched" edge (i.e. the user vertex). The group count is stored in the Map defined as m.

Now that we have m we can use that to get the average:

total = m.values().sum()
avg = total / m.size()

Upvotes: 4

cybersam
cybersam

Reputation: 67044

Since you asked for either Cypher or Gremlin, below are the Cypher queries.

  1. It was not obvious to me that your data model had any node labels, so here are some queries that only include user nodes that have watched at least 1 movie. This limitation stems from the fact that there is no way to identify that a node without a watched outgoing relationship is actually a user.

    (a) How to get each distinct user and the number of (distinct) movies that s/he watched. (Users who did not watch any movies will not be in the returned collection.)

    MATCH (u)-[:watched]->(m)
    RETURN u, COUNT(DISTINCT m);
    

    I assumed you did not want to count the same movie twice for the same user (in those cases where the user watched the same movie multiple times).

    (b) How to get the average number of (distinct) movies watched by all users (who watched any movies at all):

    MATCH (u)-[:watched]->(m)
    WITH u, COUNT(DISTINCT m) AS cdm
    RETURN avg(cdm);
    
  2. If you wanted to also include users who did not watch any movies, then you may need to make sure that all user nodes are labelled (say, by the label "User"). In the following queries, I assume that that has been done.

    (a) How to get each distinct user and the number of (distinct) movies that s/he watched:

    MATCH (u:User)
    OPTIONAL MATCH (u)-[:watched]->(m)
    RETURN u, COUNT(DISTINCT m);
    

    (b) How to get the average number of (distinct) movies watched by all users:

    MATCH (u:User)
    OPTIONAL MATCH (u)-[:watched]->(m)
    WITH u, COUNT(DISTINCT m) AS cdm
    RETURN avg(cdm);
    

Upvotes: 3

Related Questions