Reputation: 21
I have a cypher query that is creating the following in one shot:
- 500 primary nodes
- 12 nodes related to each of the primary nodes (including the relationship)
- 2 relationships on two existing nodes (the 500 primary nodes will be in relation with the same two nodes)
At the end, it will create:
- 7000 nodes
- 7000 relationships
At the end of the query I have this statement to return all created entities :
MATCH (n:Primary) WHERE n.id IN {ids} WITH n MATCH path=(n)-[*0..]->() RETURN path
The query is taking around 15 seconds to be executed by using a remote neo4j database located on another machine on the same network.
I was expecting to have a better result than this, so I did some profiling and here is what I got:
85.0% - 10,816 ms org.neo4j.ogm.session.Neo4jSession.query
85.0% - 10,816 ms org.neo4j.ogm.session.delegates.ExecuteQueriesDelegate.query
85.0% - 10,816 ms org.neo4j.ogm.session.delegates.ExecuteQueriesDelegate.executeAndMap
72.3% - 9,202 ms org.neo4j.ogm.context.GraphEntityMapper.map
63.0% - 8,019 ms org.neo4j.ogm.context.GraphEntityMapper.map
52.1% - 6,628 ms java.lang.Class.isAssignableFrom
10.8% - 1,371 ms org.neo4j.ogm.context.GraphEntityMapper.mapEntities
0.1% - 9,967 µs org.neo4j.ogm.context.MappingContext.getNodeEntity
So from my understanding, the query is executed in the method executeAndMap from the ExecuteQueriesDelegates class and the response is immediately returned in there.
After this, this is using a GraphEntityMapper to create the entities from the response.
So I conclude that the query itself is executed in a reasonable amount of time (less than 2 seconds - I verified that by posting the query directly from a REST client) but more than 70% of the time is spent to transform the result of the response into my model entities.
So I suppose that it normal that it takes some time to do this transformation but it seems that a way too much time is spent doing the following test:
if ... type.isAssignableFrom(o.getClass())
So I am not sure if any improvement can be done on neo4j-ogm side. In the meantime, is there any workaround that can help me to improve the performances?
02/10/2017 Additional info
The result of the profiling above was done by using Sampling setting.
I've done new measurements using the Instrumentation settings and here are the results which differ a little bit from the last results. For this test, I defined only 200 primary nodes because I was getting an exception with 500 (was taking too long time).
93.6% - 56,585 ms - 1 inv. org.neo4j.ogm.session.Neo4jSession.query
93.6% - 56,585 ms - 1 inv. org.neo4j.ogm.session.delegates.ExecuteQueriesDelegate.query
93.6% - 56,585 ms - 1 inv. org.neo4j.ogm.session.delegates.ExecuteQueriesDelegate.executeAndMap
64.3% - 38,875 ms - 1 inv. org.neo4j.ogm.context.GraphEntityMapper.map
46.7% - 28,229 ms - 9,800 inv. org.neo4j.ogm.context.GraphEntityMapper.map
12.4% - 7,527 ms - 14,021,770 inv. org.neo4j.ogm.context.MappingContext.getNodeEntity
9.8% - 5,918 ms - 9,800 inv. org.neo4j.ogm.context.GraphEntityMapper.mapEntities
3.4% - 2,067 ms - 14,021,770 inv. java.lang.Class.isAssignableFrom
2.7% - 1,621 ms - 14,021,770 inv. java.util.Iterator.next
2.6% - 1,552 ms - 14,021,770 inv. java.lang.Object.getClass
2.5% - 1,523 ms - 14,031,570 inv. java.util.Iterator.hasNext
0.2% - 116 ms - 984,900 inv. java.util.List.add
0.2% - 107 ms - 984,900 inv. java.lang.Class.cast
0.0% - 2,173 µs - 9,800 inv. java.util.Set.iterator
0.0% - 1,731 µs - 9,800 inv. java.util.ArrayList.<init>
0.0% - 1,170 µs - 9,800 inv. java.util.List.isEmpty
We can see here that most of the time is not anymore spent by invoking the method isAssignableFrom.
I will continue my investigation and keep you informed if I find something.
Upvotes: 2
Views: 270
Reputation: 2181
There are a few things going on here.
About 35% of the overall time is spent simply fetching the completed response from the database and converting this to simple Node
and Edge
representations. I suspect the majority of this time is spent waiting on IO
Next, the mapping phase starts. In your case, the query is returning nearly 10,000 objects which includes both Node
and Edge
objects. Each Node
is looked up in the mapping context, to determine if it already exists, and is updated if not. This lookup links the Node
reference to an object in your domain. However, you'll see that there are more than 14 million calls to getNodeEntity
, which requires some explanation.
The majority of these calls are made when the Edge
elements in the response are being processed. Each of these requires an additional lookup in the mapping context for both the start and end nodes so that the reference can be set on the correct domain object(s). However, if during this process it turns out that a particular relationship is maintained in a collection or array by a domain object, then the collection or array is not updated immediately for each relationship as it is discovered. Instead, the set of relationships in the collection are grouped together and updating the domain object is deferred until after any single-value reference is set.
The assignment of collection-based references is then made from the list of Edge
objects collected earlier as described above. Unfortunately, the relevant domain object(s) are not maintained on these Edge
objects (only the database node ids). Consequently, this requires another round of lookups for the relevant domain objects from the mapping context. This could clearly be avoided, because they've been looked up once already.
I've raised a ticket for us to look at this. https://github.com/neo4j/neo4j-ogm/issues/327
Upvotes: 3