Reputation: 1387
I have a Spring Data Neo4j application that needs to do bulk data write/read to Neo4j Community Edition (3.2).
My system configuration (Macbook pro) 16GB RAM, 2.5 GHz Intel Core i7.
Total nodes : 120,000. (5 properties in each node.)
I have 500 relationships per node.
Above nodes/relationships is part of initial data I need for other parts of application to work.
I use Spring Data Neo4j for read/write transactions. Each node builds its corresponding 500 relationships sequentially. So obviously it takes a significant amount of time to build all above nodes and relationships.
Sample Code:
Entity:
//Neo4j entity class
import org.neo4j.ogm.annotation.GraphId;
import org.neo4j.ogm.annotation.NodeEntity;
import org.neo4j.ogm.annotation.Relationship;
@NodeEntity
public class SamplePojo {
@GraphId
public Long id;
private String property1;
private String property2;
private Integer property3;
private Double property4;
private Integer property5;
@Relationship(type="has_sample_relationship",direction="OUTGOING")
List<SamplePojo> sampleList = new ArrayList<>();
//Getters and setters...
}
Repository:
import org.springframework.data.neo4j.annotation.Query;
import org.springframework.data.neo4j.repository.GraphRepository;
@Repository
public interface SamplePojoRepository extends GraphRepository<SamplePojo> {
//save
}
Service class:
@Service
public class DataInsertion{
@Autowired
SamplePojoRepository repository;
public writeToNeo4j(List<SamplePojo> pojoList){
for(SamplePojo p : pojoList){
// Loop through more than 100,000 objects that have properties set and relationships as well
repository.save(); // save to neo4j db
}
}
}
My Observation:
Initially, first few minutes , it took 1200 write operations/minute.
After few minutes , write operations came down significantly from 1200 to 100 write operations/minute .
Later, it came down to 10 write operations/minute.
Does anyone know root cause of the problem of why Neo4j write operations slow down by time ?
Please let me know if additional information is needed, will update the question. Thanks in advance!
Upvotes: 1
Views: 158
Reputation: 15076
This is very broad question, you should at least profile your application to identify what part slows down - is it Neo4j itself? Particular query? Spring Data Neo4j? Your application? Then it will be easier to help you.
The usual suspects are:
your transaction is too large - split load into smaller transactions of 1k to 50k elements (nodes + relationships + properties) - this is needed because Neo4j holds transaction state in memory and it might spent to much time in GC (or even run out of memory) when you have large transactions.
growing OGM session - again causing to much time spent in GC - clear the Session from time to time (this should be done automatically with SDN when @Transactional
method is finished)
there is some operation without an index that becomes slow with growing amount of data (e.g. doing full node label scan instead of using index)
low memory for Neo4j or your application - time is spent mostly in GC
there might be a performance issue with SDN/OGM - a reproducible test case would be great for this.
Upvotes: 5