computer
computer

Reputation: 25

Cypher query for graph

how can I determine the In degree and the out degree and the total degree of each node of the graph. Also, the longest path of the graph so the diameter of the graph and the density of the graph. And the two last questions, are the number of relations per type, and the number of nodes per label so for each label.

My datasets looks like this: enter image description here

I used this query to load the dataset:

LOAD CSV WITH HEADERS FROM 'file:///vgsales.csv' AS line 
CREATE (v:Vgsales {
  rank: toInteger(line.Rank),
  name: line.Name,
  platform: line.Platform,
  year: toInteger(line.Year)
}) 
MERGE (g:GENRE {genre: line.Genre}) 
MERGE (p:PUBLISHER {
  publisher: line.Publisher,
  NA_sales: toInteger(line.NA_Sales), 
  EU_sales: toInteger(line.EU_Sales),
  JP_sales: toInteger(line.JP_Sales),
  Other_sales: toInteger(line.Other_Sales),
  Global_sales: toInteger(line.Global_Sales)
}) 
MERGE (v)-[:IN_GENRE]->(g) 
MERGE (p)-[:PUBLISHED]->(v)

Upvotes: 0

Views: 461

Answers (1)

InverseFalcon
InverseFalcon

Reputation: 30407

For a node you can use size(<pattern>) to find the degree of the pattern (provided no label is given for the other node, and no properties are present in the pattern, as those require actually expanding the path to find and filter on those things).

So to get all in and out degrees for all nodes in the graph, you can use:

MATCH (n)
RETURN id(n) as id, size((n)-->()) as outDegree, size((n)<--()) as inDegree

The diameter of the graph is the longest of all shortest paths between each node, so that requires every combination of two nodes first to get the shortest path, then only taking the longest:

MATCH (n)
WITH collect(n) as allNodes
UNWIND allNodes as a
UNWIND allNodes as b
WITH a, b
WHERE id(a) < id(b)
MATCH path = shortestPath((a)-[*]-(b))
RETURN max(length(path)) as diameter

The id(a) < id(b) restriction is to ensure we filter out rows where a and b are the same node, and to filter out mirrored results, since we only want a single combination of a and b (and not calculating a second time when a and b are swapped).

For counts of each relationship, and the counts of nodes by label, these are stored in the counts store, and the easiest way to access these statistics is via APOC Procedures. This can come pre-bundled, check this install page (changing the minor version you're using in the url for more specific instructions):

https://neo4j.com/labs/apoc/4.1/installation/

Once installed you can use CALL apoc.meta.stats() to access all graph counts. The nodeCount column will give you the total nodes in the graph, and the labels column will give you the counts per label. The relTypesCount column will give you the counts per relationship type.

I believe for graph density, for directed graphs, it's E / (V (V - 1)), where E is total edges, and V is total vertices. We can get those from the counts store and use that formula:

CALL apoc.meta.stats() YIELD nodeCount, relCount
RETURN toFloat(relCount) / (nodeCount * (nodeCount - 1)) as density

Upvotes: 2

Related Questions