Reputation: 93
I have a graph database in Neo4j with drugs and drug-drug interactions, among other entities. In this regard, ()-[:IS_PARTICIPANT_IN]->()
connects a drug to an interaction. I need to obtain those pairs of drugs a
and b
which are not involved in any other :IS_PARTICIPANT_IN
relationship other than the one between them, i.e. (a)-[:IS_PARTICIPANT_IN]->(ddi:DrugDrugInteraction)<-[:IS_PARTICIPANT_IN]-(b)
, without any other IS_PARTICIPANT_IN
relationships involving neither a
nor b
.
For that purpose, I have tried the following Cypher query. However, it ends up reaching heap size (raised to 8 GB), as collect operations consume too much memory.
MATCH (drug1:Drug)-[r1:IS_PARTICIPANT_IN]->(ddi:DrugDrugInteraction)
MATCH (drug2:Drug)-[r2:IS_PARTICIPANT_IN]->(ddi)
WHERE drug1 <> drug2
OPTIONAL MATCH (drug2)-[r3:IS_PARTICIPANT_IN]->(furtherDDI:DrugDrugInteraction)
WHERE furtherDDI <> ddi
WITH drug1, drug2, ddi, COLLECT(ddi) AS ddis, furtherDDI, COLLECT(furtherDDI) AS additionalDDIs
WITH drug1, drug2, ddi, COUNT(ddis) AS n1, COUNT(additionalDDIs) AS n2
WHERE n1 = 1 AND n2 = 0
RETURN drug1.name, drug2.name, ddi.name ORDER BY drug1;
How can I improve my code so as to get the desired results without exceeding the heap size limit?
Upvotes: 0
Views: 66
Reputation: 66989
This should work:
MATCH (d:Drug)
WHERE SIZE((d)-[:IS_PARTICIPANT_IN]->()) = 1
MATCH (d)-[:IS_PARTICIPANT_IN]->(ddi)
RETURN ddi.name AS ddiName, COLLECT(d.name) AS drugNames
ORDER BY drugNames[0]
The WHERE
clause uses a very efficient degreeness check to filter for Drug
nodes that have only a single outgoing IS_PARTICIPANT_IN
relationship. This check is efficient because it does not have to actually get any DrugDrugInteraction
nodes.
After the degreeness check, the query performs a second MATCH
to actually get the associated DrugDrugInteraction
node. (I assume that the IS_PARTICIPANT_IN
relationship only points at DrugDrugInteraction
nodes, and have therefore omitted the label from the search pattern, for efficiency).
The RETURN
clause uses the aggregating function COLLECT
to collect the Drug
names for each ddi
name. (I assume that ddi
nodes have unique names.)
By the way, this query will also work if there are any number of Drug
s (not just 2) that participate in the same DrugDrugInteraction
, and no other ones. Also, if a matched DrugDrugInteraction
happens to have a related Drug
that participates in other interactions, this query will not include that Drug
in the result (since this query only pays attention to d
nodes that passed the initial degreeness check).
Upvotes: 3