Juanpe
Juanpe

Reputation: 446

Neo4J - Nested node causes java.lang.OutOfMemoryError on JVM

I've just started practicing with Neo4J and after reading some docs and tutorials I hace created my data model and I've come across a problem when dealing with nested nodes, a node that's related to another node with the same label and so on (there is no max to how many levels down the relationships go)

Here is a quick overview of my graph model

So basically a type belongs to a Person, and then each type can be a child of another type. What I'm attempting is to read all the types related to a person (and the children type nodes related to the highest type node). There are between 2000-3000 types per person and the nested type nodes can be as much as 10 levels deep.


Here is what I'm trying:

MATCH (p:Person{name: 'Bill'})-[*0..]-(t:Type) RETURN p, t

This makes the JVM run out of memory, the DB is hosted on a box with 2GB of memory for now, but will be upgraded later but I still feel like I'm missing something, how can such query require so much memory?

If I run the same query with a depth limit, the query runs fine up to depth 5, so

MATCH (p:Person{name: 'Bill'})-[*0..4]-(t:Type) RETURN p, t

This query works fine and gives me back about 2308 nodes and 2307 relationship for one Person.


Is the solution just to host the DB on a box with more memory, but even then, what happens when the DB gets multiple requests at the same time?

I feel like I'm missing something.

Upvotes: 0

Views: 77

Answers (2)

InverseFalcon
InverseFalcon

Reputation: 30417

You may want to use APOC Procedures here. You don't seem to be interested in the paths themselves, just the nodes, so the procedure apoc.path.subgraphNodes() may do the trick (ensuring you're using whatever the correct direction is for the :CHILD_OF relationships to follow):

MATCH (p:Person{name: 'Bill'})<-[:BELONGS_TO]-(t:Type)
CALL apoc.path.subgraphNodes(t, {relationshipFilter:'CHILD_OF>'}) YIELD node
RETURN p, node

Upvotes: 0

cybersam
cybersam

Reputation: 67044

Unbounded variable length paths are notoriously memory and time intensive, since the size of search tree grows exponentially with the depth of the search.

Aside from that, I think there is another reason why your query is taking so long. Your query's relationship pattern (-[*0..]-) is non-directional. Therefore, the variable length search will not just go from a child type to its parent type, but also back down from a parent type to all its child types. This results in searching through a lot of types (potentially all of them) that you are not really interested in, and your results would be invalid.

Since your data model's relationships all point in the direction of a Person, this might work better for you (i.e., you might be able to get it to work with a higher upper bound than 4; in this example, I used 10):

MATCH (p:Person {name: 'Bill'})<-[*..10]-(t:Type)
RETURN p, t

This query uses a directional relationship pattern. It also lets the lower variable length bound default to 1 (since presumably no Person nodes are also Type nodes, anyway), which will eliminate same wasted processing time.

By the way, if you do not already have an index on Person(name), you should consider adding one to speed up the finding of the Person node (although that would not be the reason why you ran out of memory).

Upvotes: 2

Related Questions