BC Smith
BC Smith

Reputation: 31

Neo4j Performance Challenge - How to Improve?

I've been wrangling with Neo4J for the last few weeks, trying to resolve some extremely challenging performance problems. At this point, I need some additional help because I can't determine how to move forward.

I have a graph with a total of approx 12.5 Million nodes and 64 Million relationships. The purpose of the graph is going to be analyzing suspicious financial behavior, so it is customers, accounts, transactions, etc.

Here is an example of the performance challenge:

Relevant Schema

 ON :Transaction(baseamount)    ONLINE                             
 ON :Transaction(type)          ONLINE                             
 ON :Transaction(amount)        ONLINE                             
 ON :Transaction(currency)      ONLINE                             
 ON :Transaction(basecurrency)  ONLINE                             
 ON :Transaction(transactionid) ONLINE (for uniqueness constraint)

Profile of Query:

neo4j-sh (?)$ PROFILE MATCH (t:Transaction) WHERE t.amount > 8000.00 return count(t);  
+----------+
| count(t) |
+----------+
| 10696    |
+----------+
1 row

ColumnFilter
  |
  +EagerAggregation
    |
    +Filter
      |
      +NodeByLabel

+------------------+---------+----------+-------------+------------------------------------------+
|         Operator |    Rows |   DbHits | Identifiers |                                    Other |
+------------------+---------+----------+-------------+------------------------------------------+
|     ColumnFilter |       1 |        0 |             |                    keep columns count(t) |
| EagerAggregation |       1 |        0 |             |                                          |
|           Filter |   10696 | 13216382 |             | Property(t,amount(62)) > {  AUTODOUBLE0} |
|      NodeByLabel | 6608191 |  6608192 |        t, t |                             :Transaction |
+------------------+---------+----------+-------------+------------------------------------------+

Here are details of my setup:

System: Linux, Ubuntu, 16GB RAM, 3.5 i5 Proc, 256GB SSD HD

CPU

$ cat /proc/cpuinfo 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 60
model name  : Intel(R) Core(TM) i5-4690K CPU @ 3.50GHz
stepping    : 3
microcode   : 0x12
cpu MHz     : 4230.625
cache size  : 6144 KB

Memory

$ cat /proc/meminfo
MemTotal:       16115020 kB
MemFree:          224856 kB
MemAvailable:    8807160 kB
Buffers:          124356 kB
Cached:          8429964 kB
SwapCached:         8388 kB

Disk

$ df -h
Filesystem                  Size  Used Avail Use% Mounted on
/dev/mapper/data1--vg-root  219G   32G  177G  16% /

Neo4J.properties

neostore.nodestore.db.mapped_memory=200M
neostore.relationshipstore.db.mapped_memory=1G
neostore.relationshipgroupstore.db.mapped_memory=200M
neostore.propertystore.db.mapped_memory=500M
neostore.propertystore.db.strings.mapped_memory=500M
neostore.propertystore.db.arrays.mapped_memory=50M
neostore.propertystore.db.index.keys.mapped_memory=200M
relationship_auto_indexing=true

Neo4J-Wrapper.properties

wrapper.java.additional=-Dorg.neo4j.server.properties=conf/neo4j-server.properties
wrapper.java.additional=-Djava.util.logging.config.file=conf/logging.properties
wrapper.java.additional=-Dlog4j.configuration=file:conf/log4j.properties

#********************************************************************
# JVM Parameters
#********************************************************************

wrapper.java.additional=-XX:+UseConcMarkSweepGC
wrapper.java.additional=-XX:+CMSClassUnloadingEnabled
wrapper.java.additional=-XX:-OmitStackTraceInFastThrow

# Uncomment the following lines to enable garbage collection logging
wrapper.java.additional=-Xloggc:data/log/neo4j-gc.log
wrapper.java.additional=-XX:+PrintGCDetails
wrapper.java.additional=-XX:+PrintGCDateStamps
wrapper.java.additional=-XX:+PrintGCApplicationStoppedTime
wrapper.java.additional=-XX:+PrintPromotionFailure
wrapper.java.additional=-XX:+PrintTenuringDistribution

# Java Heap Size: by default the Java heap size is dynamically
# calculated based on available system resources.
# Uncomment these lines to set specific initial and maximum
# heap size in MB.
wrapper.java.initmemory=4096
wrapper.java.maxmemory=6144

Other:

Please let me know if I should add anything else that would be salient to the discussion


Update 1

Thank you very much for your help, I just moved so I was delayed in responding.

  1. Size of Neostore Files:

    /data/graph.db$ ls -lah neostore.*
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.id
    -rw-rw-r-- 1 neo4j neo4j  110 Apr  2 13:03 neostore.labeltokenstore.db
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.labeltokenstore.db.id
    -rw-rw-r-- 1 neo4j neo4j  874 Apr  2 13:03 neostore.labeltokenstore.db.names
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.labeltokenstore.db.names.id
    -rw-rw-r-- 1 neo4j neo4j 200M Apr  2 13:03 neostore.nodestore.db
    -rw-rw-r-- 1 neo4j neo4j   41 Apr  2 13:03 neostore.nodestore.db.id
    -rw-rw-r-- 1 neo4j neo4j   68 Apr  2 13:03 neostore.nodestore.db.labels
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.nodestore.db.labels.id
    -rw-rw-r-- 1 neo4j neo4j 2.8G Apr  2 13:03 neostore.propertystore.db
    -rw-rw-r-- 1 neo4j neo4j  128 Apr  2 13:03 neostore.propertystore.db.arrays
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.propertystore.db.arrays.id
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.propertystore.db.id
    -rw-rw-r-- 1 neo4j neo4j  720 Apr  2 13:03 neostore.propertystore.db.index
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.propertystore.db.index.id
    -rw-rw-r-- 1 neo4j neo4j 3.1K Apr  2 13:03 neostore.propertystore.db.index.keys
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.propertystore.db.index.keys.id
    -rw-rw-r-- 1 neo4j neo4j 1.7K Apr  2 13:03 neostore.propertystore.db.strings
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.propertystore.db.strings.id
    -rw-rw-r-- 1 neo4j neo4j  47M Apr  2 13:03 neostore.relationshipgroupstore.db
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.relationshipgroupstore.db.id
    -rw-rw-r-- 1 neo4j neo4j 1.1G Apr  2 13:03 neostore.relationshipstore.db
    -rw-rw-r-- 1 neo4j neo4j 1.6M Apr  2 13:03 neostore.relationshipstore.db.id
    -rw-rw-r-- 1 neo4j neo4j  165 Apr  2 13:03 neostore.relationshiptypestore.db
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.relationshiptypestore.db.id
    -rw-rw-r-- 1 neo4j neo4j 1.3K Apr  2 13:03 neostore.relationshiptypestore.db.names
    -rw-rw-r-- 1 neo4j neo4j    9 Apr  2 13:03 neostore.relationshiptypestore.db.names.id
    -rw-rw-r-- 1 neo4j neo4j 3.5K Apr  2 13:03 neostore.schemastore.db
    -rw-rw-r-- 1 neo4j neo4j   25 Apr  2 13:03 neostore.schemastore.db.id
    
  2. I read that mapped memory settings are replaced by another cache, and I have commented out those settings.

  3. Java Profiler

       JvmTop 0.8.0 alpha - 16:12:59,  amd64,  4 cpus, Linux 3.16.0-33, load avg 0.30
       http://code.google.com/p/jvmtop
    
       Profiling PID 4260:            org.neo4j.server.Bootstrapper 
    
        68.67% (    14.01s) org.neo4j.kernel.impl.nioneo.store.StoreFileChannel.read()
        18.73% (     3.82s) org.neo4j.kernel.impl.nioneo.store.StoreFailureException.<init>()
         2.86% (     0.58s) org.neo4j.kernel.impl.cache.ReferenceCache.put()
         1.11% (     0.23s) org.neo4j.helpers.Counter.inc()
         0.87% (     0.18s) org.neo4j.kernel.impl.cache.ReferenceCache.get()
         0.65% (     0.13s) org.neo4j.cypher.internal.compiler.v2_1.parser.Literals$class.PropertyKeyName()
         0.63% (     0.13s) org.parboiled.scala.package$.getCurrentRuleMethod()
         0.62% (     0.13s) scala.collection.mutable.OpenHashMap.<init>()
         0.62% (     0.13s) scala.collection.mutable.AbstractSeq.<init>()
         0.62% (     0.13s) org.neo4j.kernel.impl.cache.AutoLoadingCache.get()
         0.61% (     0.13s) scala.collection.TraversableLike$$anonfun$map$1.apply()
         0.61% (     0.12s) org.neo4j.kernel.impl.transaction.TxManager.assertTmOk()
         0.61% (     0.12s) org.neo4j.cypher.internal.compiler.v2_1.commands.EntityProducerFactory.<init>()
         0.61% (     0.12s) scala.collection.AbstractTraversable.<init>()
         0.61% (     0.12s) scala.collection.immutable.List.toStream()
         0.60% (     0.12s) org.neo4j.kernel.impl.nioneo.store.NodeStore.getRecord()
         0.57% (     0.12s) org.neo4j.kernel.impl.transaction.TxManager.getTransaction()
         0.37% (     0.08s) org.parboiled.scala.Parser$class.rule()
         0.06% (     0.01s) scala.util.DynamicVariable.value()
    

Upvotes: 3

Views: 354

Answers (1)

Stefan Armbruster
Stefan Armbruster

Reputation: 39926

Unfortunately the schema indexes (aka those created using CREATE INDEX ON :Label(property)) do not yet support larger than/smaller than conditions. Therefore Neo4j falls back to scan all nodes with the given label and filter on their properties. This is of course expensive.

I do see two different approaches to tackle this:

1) If your condition does always have a pre-defined maximum granularity e.g. 10s of USDs, you can build up an "amount-tree" similar to a time-tree (see http://graphaware.com/neo4j/2014/08/20/graphaware-neo4j-timetree.html).

2) if you don't know the granularity upfront the other option is to setup a manual or auto index for the amount property, see http://neo4j.com/docs/stable/indexing.html. The most easy thing is probably using auto index. In neo4j.properties set the following options:

node_auto_indexing=true
node_keys_indexable=amount

Note that this will not automatically add all existing transaction into that index, it just puts those in the index that have been written to since auto indexing is enabled.

You can do a explicit range query on the auto index using

MATCH t=node:node_auto_index("amount:[6000 TO 999999999]")
RETURN count(t)

Upvotes: 2

Related Questions