Scott Campbell
Scott Campbell

Reputation: 81

Neo4j/cypher query optiminzation

The query below is attempting to find all subtypes of a concept within an ontology. Once all concepts are identified, the query attempts to identify all concepts within the hierarchy that are defined as "primitive" (a property definitionStatusId) and which have supertype and subtype concepts that are defined as "fully defined" (using the same property definitionStatusId).

The number of concepts in the hierarchy is 101,000 (several million paths), the total number of "primitive" concepts = 56,000, and the total number of non-leaf node primitives = 13,000.

The webadmin hangs when executing the query. So, it is obviously not written well. I am trying to reduce the scope of the query at each step, but certainly I am not doing it correctly!

Suggestions?

//Find all clinical finding subtypes
MATCH (a:ObjectConcept{sctid:404684003})<-[:ISA*]-(b:ObjectConcept)
with collect (DISTINCT b) as set1
//Find all primitive subtypes of clinical findings
UNWIND set1 as x1
MATCH x1
WHERE x1.definitionStatusId = 900000000000074008
with collect (DISTINCT x1) as prim,set1
//Find all fully-defined subtypes of clinical findings
UNWIND set1 as x2
MATCH x2
WHERE x2.definitionStatusId = 900000000000073002
with collect (DISTINCT x2) as full, prim
//Find all primitives with a fully-defined subtype
UNWIND prim as prim1
UNWIND full as full1
MATCH prim1,full1
WHERE EXISTS((prim1)<-[:ISA*]-(full1)) as prim11
with collect (DISTINCT prim11) as intprim, full1
//Find all high level primitives with a fully-defined supertype
UNWIND intprim as intprim1
MATCH intprim1,full1
WHERE EXISTS((full1)<-[:ISA*]-(intprim1)) as intprim11
with collect (DISTINCT intprim11) as intprim2
//Find all high level primitives with fully defined supertypes
//that also have fully defined subtypes
UNWIND intprim2 as intprim22
return intprim22.sctid,intprim22.FSN

Upvotes: 2

Views: 92

Answers (2)

Scott Campbell
Scott Campbell

Reputation: 81

The follow query produced a valid answer without crashing the web browser. However, it still took an extended period of time to complete. So, additional optimization is necessary.

//Find all subtypes of ObjectConcept
MATCH (a:ObjectConcept{sctid:404684003})<-[:ISA*]-(b:ObjectConcept{definitionStatusId:900000000000074008})
//Remove duplicate nodes
with distinct(b) as prim
//Remove leaf nodes
MATCH (prim)<-[:ISA*1..1]-()
with distinct(prim) as prim1
//Find all intermediate primitive nodes satisfying the requirements
MATCH (c:ObjectConcept{definitionStatusId:900000000000073002})<-[:ISA*]-(prim1)<-[:ISA*]-(d:ObjectConcept{definitionStatusId:900000000000073002})
return DISTINCT prim1.sctid,prim1.FSN

Upvotes: 0

Martin Preusse
Martin Preusse

Reputation: 9369

Your query looks very complicated. If I understand correctly, you want to find all ObjectConcept nodes that are "primitive" and have a parent and child that are "fully defined":

(ObjectConcept "fully defined")
   |
[:ISA]
   |
(ObjectConcept "primitive")  <-- find this?
   |
[:ISA]
   |
(ObjectConcept "fully defined")

I think you can simply match:

MATCH (oc_parent:ObjectConcept {definitionStatusId: 'fully'})-[:ISA]->
      (oc:ObjectConcept {definitionStatusId: 'primitive'})-[:ISA]->
      (oc_child:ObjectConcept {definitionStatusId: 'fully'})
RETURN oc_parent, oc, oc_child

All the DISTINCT, collect() and UNWIND do not necessarily make your query faster.

You should have an index on definitionStatusId:

CREATE INDEX ON :ObjectConcept(definitionStatusId)

Update

You can query for the ObjectConcepts in a sub hierarchy first but the [:ISA*] query might look for very long paths and could be very expensive. 300.000 nodes is not that much, so the first solution might actually be faster. I would only do that if you really have to filter for a subset of the nodes.

// get the sub-hierarchy first
MATCH (:ObjectConcept {sctid:404684003})<-[:ISA*]-(oc:ObjectConcept {definitionStatusId: 'primitive'})
WITH oc
MATCH (oc_parent:ObjectConcept {definitionStatusId: 'fully'})-[:ISA]->
      (oc)-[:ISA]->
      (oc_child:ObjectConcept {definitionStatusId: 'fully'})
RETURN oc_parent, oc, oc_child

Upvotes: 1

Related Questions