Reputation: 446
I've just started practicing with Neo4J and after reading some docs and tutorials I hace created my data model and I've come across a problem when dealing with nested nodes, a node that's related to another node with the same label and so on (there is no max to how many levels down the relationships go)
So basically a type belongs to a Person, and then each type can be a child of another type. What I'm attempting is to read all the types related to a person (and the children type nodes related to the highest type node). There are between 2000-3000 types per person and the nested type nodes can be as much as 10 levels deep.
Here is what I'm trying:
MATCH (p:Person{name: 'Bill'})-[*0..]-(t:Type) RETURN p, t
This makes the JVM run out of memory, the DB is hosted on a box with 2GB of memory for now, but will be upgraded later but I still feel like I'm missing something, how can such query require so much memory?
If I run the same query with a depth limit, the query runs fine up to depth 5, so
MATCH (p:Person{name: 'Bill'})-[*0..4]-(t:Type) RETURN p, t
This query works fine and gives me back about 2308 nodes and 2307 relationship for one Person.
Is the solution just to host the DB on a box with more memory, but even then, what happens when the DB gets multiple requests at the same time?
I feel like I'm missing something.
Upvotes: 0
Views: 77
Reputation: 30417
You may want to use APOC Procedures here. You don't seem to be interested in the paths themselves, just the nodes, so the procedure apoc.path.subgraphNodes()
may do the trick (ensuring you're using whatever the correct direction is for the :CHILD_OF relationships to follow):
MATCH (p:Person{name: 'Bill'})<-[:BELONGS_TO]-(t:Type)
CALL apoc.path.subgraphNodes(t, {relationshipFilter:'CHILD_OF>'}) YIELD node
RETURN p, node
Upvotes: 0
Reputation: 67044
Unbounded variable length paths are notoriously memory and time intensive, since the size of search tree grows exponentially with the depth of the search.
Aside from that, I think there is another reason why your query is taking so long. Your query's relationship pattern (-[*0..]-
) is non-directional. Therefore, the variable length search will not just go from a child type to its parent type, but also back down from a parent type to all its child types. This results in searching through a lot of types (potentially all of them) that you are not really interested in, and your results would be invalid.
Since your data model's relationships all point in the direction of a Person
, this might work better for you (i.e., you might be able to get it to work with a higher upper bound than 4; in this example, I used 10):
MATCH (p:Person {name: 'Bill'})<-[*..10]-(t:Type)
RETURN p, t
This query uses a directional relationship pattern. It also lets the lower variable length bound default to 1 (since presumably no Person
nodes are also Type
nodes, anyway), which will eliminate same wasted processing time.
By the way, if you do not already have an index on Person(name)
, you should consider adding one to speed up the finding of the Person
node (although that would not be the reason why you ran out of memory).
Upvotes: 2