Claudal
Claudal

Reputation: 21

Performances issue with neo4j-ogm 2.1.1 with a cypher query creating nodes and related nodes in one shot

I have a cypher query that is creating the following in one shot:
- 500 primary nodes
- 12 nodes related to each of the primary nodes (including the relationship)
- 2 relationships on two existing nodes (the 500 primary nodes will be in relation with the same two nodes)

At the end, it will create:
- 7000 nodes
- 7000 relationships

At the end of the query I have this statement to return all created entities :

MATCH (n:Primary) WHERE n.id IN {ids} WITH n MATCH path=(n)-[*0..]->() RETURN path

The query is taking around 15 seconds to be executed by using a remote neo4j database located on another machine on the same network.

I was expecting to have a better result than this, so I did some profiling and here is what I got:

85.0% - 10,816 ms org.neo4j.ogm.session.Neo4jSession.query
  85.0% - 10,816 ms org.neo4j.ogm.session.delegates.ExecuteQueriesDelegate.query
    85.0% - 10,816 ms org.neo4j.ogm.session.delegates.ExecuteQueriesDelegate.executeAndMap
      72.3% - 9,202 ms org.neo4j.ogm.context.GraphEntityMapper.map
        63.0% - 8,019 ms org.neo4j.ogm.context.GraphEntityMapper.map
          52.1% - 6,628 ms java.lang.Class.isAssignableFrom
            10.8% - 1,371 ms org.neo4j.ogm.context.GraphEntityMapper.mapEntities
              0.1% - 9,967 µs org.neo4j.ogm.context.MappingContext.getNodeEntity

So from my understanding, the query is executed in the method executeAndMap from the ExecuteQueriesDelegates class and the response is immediately returned in there.

After this, this is using a GraphEntityMapper to create the entities from the response.

So I conclude that the query itself is executed in a reasonable amount of time (less than 2 seconds - I verified that by posting the query directly from a REST client) but more than 70% of the time is spent to transform the result of the response into my model entities.

So I suppose that it normal that it takes some time to do this transformation but it seems that a way too much time is spent doing the following test:

if ... type.isAssignableFrom(o.getClass())

So I am not sure if any improvement can be done on neo4j-ogm side. In the meantime, is there any workaround that can help me to improve the performances?


02/10/2017 Additional info

The result of the profiling above was done by using Sampling setting.

I've done new measurements using the Instrumentation settings and here are the results which differ a little bit from the last results. For this test, I defined only 200 primary nodes because I was getting an exception with 500 (was taking too long time).

93.6% - 56,585 ms - 1 inv. org.neo4j.ogm.session.Neo4jSession.query
  93.6% - 56,585 ms - 1 inv. org.neo4j.ogm.session.delegates.ExecuteQueriesDelegate.query
    93.6% - 56,585 ms - 1 inv. org.neo4j.ogm.session.delegates.ExecuteQueriesDelegate.executeAndMap
      64.3% - 38,875 ms - 1 inv. org.neo4j.ogm.context.GraphEntityMapper.map
        46.7% - 28,229 ms - 9,800 inv. org.neo4j.ogm.context.GraphEntityMapper.map
          12.4% - 7,527 ms - 14,021,770 inv. org.neo4j.ogm.context.MappingContext.getNodeEntity
          9.8% - 5,918 ms - 9,800 inv. org.neo4j.ogm.context.GraphEntityMapper.mapEntities
          3.4% - 2,067 ms - 14,021,770 inv. java.lang.Class.isAssignableFrom
          2.7% - 1,621 ms - 14,021,770 inv. java.util.Iterator.next
          2.6% - 1,552 ms - 14,021,770 inv. java.lang.Object.getClass
          2.5% - 1,523 ms - 14,031,570 inv. java.util.Iterator.hasNext
          0.2% - 116 ms - 984,900 inv. java.util.List.add
          0.2% - 107 ms - 984,900 inv. java.lang.Class.cast
          0.0% - 2,173 µs - 9,800 inv. java.util.Set.iterator
          0.0% - 1,731 µs - 9,800 inv. java.util.ArrayList.<init>
          0.0% - 1,170 µs - 9,800 inv. java.util.List.isEmpty

We can see here that most of the time is not anymore spent by invoking the method isAssignableFrom.

I will continue my investigation and keep you informed if I find something.

Upvotes: 2

Views: 270

Answers (1)

Vince
Vince

Reputation: 2181

There are a few things going on here.

About 35% of the overall time is spent simply fetching the completed response from the database and converting this to simple Node and Edge representations. I suspect the majority of this time is spent waiting on IO

Next, the mapping phase starts. In your case, the query is returning nearly 10,000 objects which includes both Node and Edge objects. Each Node is looked up in the mapping context, to determine if it already exists, and is updated if not. This lookup links the Node reference to an object in your domain. However, you'll see that there are more than 14 million calls to getNodeEntity, which requires some explanation.

The majority of these calls are made when the Edge elements in the response are being processed. Each of these requires an additional lookup in the mapping context for both the start and end nodes so that the reference can be set on the correct domain object(s). However, if during this process it turns out that a particular relationship is maintained in a collection or array by a domain object, then the collection or array is not updated immediately for each relationship as it is discovered. Instead, the set of relationships in the collection are grouped together and updating the domain object is deferred until after any single-value reference is set.

The assignment of collection-based references is then made from the list of Edge objects collected earlier as described above. Unfortunately, the relevant domain object(s) are not maintained on these Edge objects (only the database node ids). Consequently, this requires another round of lookups for the relevant domain objects from the mapping context. This could clearly be avoided, because they've been looked up once already.

I've raised a ticket for us to look at this. https://github.com/neo4j/neo4j-ogm/issues/327

Upvotes: 3

Related Questions