Write performance about neo4j when use py2neo

Question

I am trying to use neo4j in my application. Now I am facing a few critical problems in the experiment. The problems statement is divided into such several parts.

BACKGROUND:

The use case is getting data from internet, the data scale is billion, the scene is real time, the relationships is just person-to-person with several properties.

CONFIGURATION:

Machine configuration:

  cpu: 24 processors, Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
  memory: 165 203 696 kB
  jdk: java version "1.7.0_67",  Java(TM) SE Runtime Environment (build 1.7.0_67-b01),  Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
  Linux version: 2.6.32-431.el6.x86_64
  OS: CentOS release 6.5

Neo4j configuration:

  enterprise version: 2.1.5
  jvm heap: default
  objects cache:
  neostore.nodestore.db.mapped_memory=512M
  neostore.relationshipstore.db.mapped_memory=6G
  neostore.propertystore.db.mapped_memory=5G
  neostore.propertystore.db.strings.mapped_memory=1G
  neostore.propertystore.db.arrays.mapped_memory=1G

client configuration:

  pyneo, version 1.6.4

CODE IN CLIENT:

CYPHER_WEIGHT_COMPUTE='r.weight=r.weight+r.weight*EXP((TIMESTAMP()-r.update_time)/(r.half_life*1.0))'

// Initiation, create constraints according the label on id
self.query=neo.CypherQuery(self.graph_db,'CREATE CONSTRAINT ON (pn:UID)
ASSERT pn.id IS UNIQUE')
self.query.execute()
self.query=neo.CypherQuery(self.graph_db,'CREATE CONSTRAINT ON (pm:GID)
ASSERT pm.id IS UNIQUE')
self.query.execute()

// Cypher clause
MERGE(first:{TYPE1} {{id:'{val1}'}})
MERGE (second:{TYPE2} {{id:'{val2}'}})
MERGE (first)-[r:{RTYPE}]->(second) ON CREATE SET r.weight={weight_set} ON
MATCH SET {weight_compute}
WITH r
SET r.half_life={half_life},
    r.update_time=TIMESTAMP(),
    r.threshold={threshold}
WITH r
WHERE r.weight



RESULT:

When i use 24 python threads with py2neo writing 59229 nodes, 236048
relationships, 531325 properties. average time expense is about 1316
seconds. The result cann't meet my real time need, it will work well for me
if the time expense decrease to 150 seconds. And the time expense for each
node/relationship will increase when the scale of the data increases

QUESTIONS:


Is there any other way of improving the write performance except for
optimizing the cypher clause and using batch insertion?  I have tried the
way of configuring different size of jvm heap and objects cache. And i found
that it effected less on the write performance, i think the reason is maybe
the small scale of nodes/relationships (from thousands to ten thousands),
the efficiency may be significant in the big scale of nodes/relationships(
ten millions, billions)
How much nps or rps can neo4j' read and write performance reach according
your experience in a scale of billion nodes/relationships?
I also found that neo4j cann't do sharding automatically, but there is
one section about cache-based sharding in the document, If i use cache-based
sharding with HAproxy, how can the relationships between the nodes which has
been sharding to different machines maintain? That's to say the
relationships are not broken by the sharding.
Can master/slave mode use in both community and enterprise version?


Thanks in advance.

Regards

Write performance about neo4j when use py2neo

Answers (1)

Related Questions