Angelo Immediata
Angelo Immediata

Reputation: 6954

NEO4J Spatial: tips about batch inserter

This is my scenario: we are building a routing system by using neo4j and the spatial plugin. We start from the OSM file and we read this file and import nodes and relationships in our graph (a custom graph model)

Now, if we don't use the batch inserter of neo4j, in order to import a compressed OSM file (with compressed dimension of around 140MB, and normal dimensions around 2GB) it takes around 3 days on a dedicated server with the following characteristics: CentOS 6.5 64bit, quad core, 8GB RAM; pease note that the most time is related to the Neo4J Nodes and relationships creation; in-fact if we read the same file without doing anything with neo4j, the file is read in around 7 minutes (i'm sure about this becouse in our process we first read the file in order to store the correct osm nodes ids and then we read again the file in order to create the neo4j graph)

Obviously we need to improve the import proces so we are trying to use the batchInserter. So far, so good (I need to check how much it will perform by using the batchInserter but I guess it will be faster); so the first thing I did was: let's try to use the batch inserter in a simple test case (very similar to our code, but without modifying our code directly)

I list my software versions:

Since I'm using osmosis in order to read the osm file, I wrote the following Sink implementation:

public class BatchInserterSinkTest implements Sink
{
 public static final Map<String, String> NEO4J_CFG = new HashMap<String, String>();
 private static File basePath = new File("/home/angelo/Scrivania/neo4j");
    private static File dbPath = new File(basePath, "db");
    private GraphDatabaseService graphDb;
    private BatchInserter batchInserter;
//    private BatchInserterIndexProvider batchIndexService;
    private SpatialDatabaseService spatialDb;
    private SimplePointLayer spl;
 static
 {
 NEO4J_CFG.put( "neostore.nodestore.db.mapped_memory", "100M" );
        NEO4J_CFG.put( "neostore.relationshipstore.db.mapped_memory", "300M" );
        NEO4J_CFG.put( "neostore.propertystore.db.mapped_memory", "400M" );
        NEO4J_CFG.put( "neostore.propertystore.db.strings.mapped_memory", "800M" );
        NEO4J_CFG.put( "neostore.propertystore.db.arrays.mapped_memory", "10M" );
        NEO4J_CFG.put( "dump_configuration", "true" );
 }
 @Override
 public void initialize(Map<String, Object> arg0)
 {
 batchInserter = BatchInserters.inserter(dbPath.getAbsolutePath(), NEO4J_CFG);
        graphDb = new SpatialBatchGraphDatabaseService(batchInserter);
        spatialDb = new SpatialDatabaseService(graphDb);
        spl = spatialDb.createSimplePointLayer("testBatch", "latitudine", "longitudine");
        //batchIndexService = new LuceneBatchInserterIndexProvider(batchInserter);

 }


 @Override
 public void complete()
 {
 // TODO Auto-generated method stub


 }


 @Override
 public void release()
 {
 // TODO Auto-generated method stub


 }


 @Override
 public void process(EntityContainer ec)
 {
 Entity entity = ec.getEntity();
        if (entity instanceof Node) {

        Node osmNodo = (Node)entity;
        org.neo4j.graphdb.Node graphNode = graphDb.createNode();
        graphNode.setProperty("osmId", osmNodo.getId());
        graphNode.setProperty("latitudine", osmNodo.getLatitude());
        graphNode.setProperty("longitudine", osmNodo.getLongitude());
        spl.add(graphNode);

        } else if (entity instanceof Way) {
            //do something with the way
        } else if (entity instanceof Relation) {
            //do something with the relation
        }


 }
}

Then I wrote the following test case:

public class BatchInserterTest
{
 private static final Log logger = LogFactory.getLog(BatchInserterTest.class.getName());


 @Test
 public void batchInserter()
 {
 File file = new File("/home/angelo/Scrivania/MilanoPiccolo.osm");
 try
 {
 boolean pbf = false;
 CompressionMethod compression = CompressionMethod.None;


 if (file.getName().endsWith(".pbf"))
 {
 pbf = true;
 }
 else if (file.getName().endsWith(".gz"))
 {
 compression = CompressionMethod.GZip;
 }
 else if (file.getName().endsWith(".bz2"))
 {
 compression = CompressionMethod.BZip2;
 }


 RunnableSource reader;


 if (pbf)
 {
 reader = new crosby.binary.osmosis.OsmosisReader(new FileInputStream(file));
 }
 else
 {
 reader = new XmlReader(file, false, compression);
 }


 reader.setSink(new BatchInserterSinkTest());


 Thread readerThread = new Thread(reader);
 readerThread.start();


 while (readerThread.isAlive())
 {
 try
 {
 readerThread.join();
 }
 catch (InterruptedException e)
 {
 /* do nothing */
 }
 }
 }
 catch (Exception e)
 {
 logger.error("Errore nella creazione di neo4j con batchInserter", e);
 }
 }
}

By executing this code, I get this exception:

Exception in thread "Thread-1" java.lang.ClassCastException: org.neo4j.unsafe.batchinsert.SpatialBatchGraphDatabaseService cannot be cast to org.neo4j.kernel.GraphDatabaseAPI
 at org.neo4j.cypher.ExecutionEngine.<init>(ExecutionEngine.scala:113)
 at org.neo4j.cypher.javacompat.ExecutionEngine.<init>(ExecutionEngine.java:53)
 at org.neo4j.cypher.javacompat.ExecutionEngine.<init>(ExecutionEngine.java:43)
 at org.neo4j.collections.graphdb.ReferenceNodes.getReferenceNode(ReferenceNodes.java:60)
 at org.neo4j.gis.spatial.SpatialDatabaseService.getSpatialRoot(SpatialDatabaseService.java:76)
 at org.neo4j.gis.spatial.SpatialDatabaseService.getLayer(SpatialDatabaseService.java:108)
 at org.neo4j.gis.spatial.SpatialDatabaseService.containsLayer(SpatialDatabaseService.java:253)
 at org.neo4j.gis.spatial.SpatialDatabaseService.createLayer(SpatialDatabaseService.java:282)
 at org.neo4j.gis.spatial.SpatialDatabaseService.createSimplePointLayer(SpatialDatabaseService.java:266)
 at it.eng.pinf.graph.batch.test.BatchInserterSinkTest.initialize(BatchInserterSinkTest.java:46)
 at org.openstreetmap.osmosis.xml.v0_6.XmlReader.run(XmlReader.java:95)
 at java.lang.Thread.run(Thread.java:744)

This is related to this code:

spl = spatialDb.createSimplePointLayer("testBatch", "latitudine", "longitudine");

So now I'm wondering: how can I use the batchInserter for my case? I have to add the created nodes to the SimplePointLayer....so how can I create it by using the batchInserter graph db service? Is there any little simple sample?

Any tip is really really appreciated

cheers Angelo

Upvotes: 2

Views: 524

Answers (1)

Craig Taverner
Craig Taverner

Reputation: 769

The OSMImporter class in the code has an example of using the batch inserter to import OSM data. The main thing is that the batch inserter is not really supported by neo4j spatial, so you need to do a few things manually. If you look at the class OSMImporter.OSMBatchWriter, you will see how it does things. It is not using the SimplePointLayer at all, since that does not support the batch inserter. It is creating the graph structure it wants directly. The simple point layer is quite simple, certainly much simpler than the OSM model created by the code I'm referencing, so I think you should be able to write a batch-inserter compatible version yourself without too much trouble.

What I would recommend is that you create the layer and nodes using the batch inserter to create the correct graph structure, then switch to the normal embedded API and use that to iterate through the nodes and add them to the spatial index.

Upvotes: 2

Related Questions