pr4n
pr4n

Reputation: 2978

Cypher - Find nodes which have more incoming than outgoing connections of a particular kind

I am modelling a transaction graph. This graph has

For a given user, I need to be able to find asset_group_id that they hold order by count of assets held in desc. Example output for a User with id = 12

User asset_group_id assets_held
12 102 32
12 76 30
12 115 24

My dataset contains ~10K Users and ~300K relationships

My approach

In order to do this, I need to find nodes which have more incoming relationships than outgoing. the diff of incoming and outgoing is held. (When they buy more and sell less)

there are also case where a User will just have incoming relationship for that asset_group_id and no outgoing relationship. (When they buy but dont sell)

Cypher

The Cypher I have tried are following and none of them is close to what I expect. How do I prepare the query for this query

Query 1

MATCH
  (a:User{id:'12'})<-[t:TXN]-(:User)
WHERE
  t.asset_group_id IN ["category1", "category2"]
MATCH
  (a)-[t1:TXN{asset_group_id: t.asset_group_id}]->(:User)
WITH
  t, t1, COUNT(t) as inC, COUNT(t1) as outC
RETURN 
  t.asset_group_id, inC , outC,  inC - outC as held
ORDER by held DESC
LIMIT 100

This query is also very slow, and I don't think optimal for a large dataset.

Query 2

This Query is needed to find all cases where people buy and dont sell

MATCH
    (:User)-[r:TXN]->(b:User{id: '12'})
  WHERE
    NOT EXISTS(
      (b)-[:TXN{asset_group_id: r.asset_group_id}]->(:User)
    )
  WITH
      r.asset_group_id as asset_group_id
RETURN owner

How do I write a better and optimised query. Ideally I would want to have a single query for this data

Upvotes: 0

Views: 165

Answers (2)

Charchit Kapoor
Charchit Kapoor

Reputation: 9284

You can match for relationships without providing a direction, and then calculate the counts conditionally, something like this:

MATCH
  (a:User{id:'12'})-[t:TXN]-(b:User)
WHERE
  t.asset_group_id IN ["category1", "category2"]
WITH
  t.asset_group_id AS asset_group_id, CASE WHEN startNode(t) = b THEN 1 ELSE 0 END as inC, CASE WHEN startNode(t) = a THEN 1 ELSE 0 END as outC
WITH 
  asset_group_id, SUM(inC) AS incomingCount, SUM(outC) AS outgoingCount
RETURN asset_group_id, incomingCount, outgoingCount, incomingCount - outgoingCount AS held
ORDER by held DESC
LIMIT 100

Upvotes: 1

Graphileon
Graphileon

Reputation: 5385

Perhaps something along these lines (not tested)

MATCH (seller:User)-[t:TXN]->(buyer:User)
WITH t.asset_group_id AS asset_group,
     [seller,buyer] AS parties,
     count(t) AS txns
UNWIND parties AS user
WITH user,asset_group,
     CASE WHEN user = parties[0] THEN txns * -1 ELSE txns END AS change
RETURN user, asset_group, sum(change) AS changes

In which changes would return 0 if there was no change, a negative if the user sold more that that he bought

Upvotes: 0

Related Questions