How to import very large geojson files into neo4j?

Question

Background of problem

I'm currently building an app that models various geographic features (roads, towns, highways, etc.) in a graph database. The geographic data is all in GeoJSON format.

There is no LOAD JSON function in the cypher language, so loading JSON files requires passing the fully parsed JavaScript object as a parameter and using UNWIND to access arrayed properties and objects to create nodes. (This guide helped me out a lot to get started: Loading JSON in neo4j). Since GeoJSON is just a spec built on JSON conventions, the load JSON method works great for reasonably sized files.

However, geographic data files can be massive. Some of the files I'm trying to import range from 100 features to 200,000 features.

The problem I'm running into is that with these very large files, the query will not MERGE any nodes in the database until it has completely processed the file. For large files, this often exceeds the 3600s timeout limit set in neo4j. So I end up waiting for an hour to find out that I have no new data in my database.

I know that with some data, the current recommendation is to convert it to CSV and then use the optimization of LOAD CSV. However, I don't believe it is easy to condense GeoJSON into CSV.

Primary Question

Is it possible to send the data from a very large JSON/GeoJSON file over in smaller batches so that neo4j will commit the data intermittently?

Current Approach

To import my data, I built a simple Express app that connects to my neo4j database via the bolt protocol (using official binary JS drivers). My GeoJSON files all have a well known text (WKT) property for each feature so that I can make use of neo4j-spatial.

Here's an example of the code I would use to import a set of road data:

session.run("WITH {json} as data UNWIND data.features as features MERGE (r:Road {wkt:features.properties.wkt})", {json: jsonObject})
  .then(function (result) {
    var records = [];
    result.records.forEach((value) => {
      records.push(value);
    });
    console.log("query completed");
    session.close();
    driver.close();
    return records;
  })
  .catch((error) => {
    console.log(error);
    // Close out the session objects
    session.close();
    driver.close();
  });

As you can see I'm passing in the entire parsed GeoJSON object as a parameter in my cypher query. Is there a better way to do this with very large files to avoid the timeout issue I'm experiencing?

How to import very large geojson files into neo4j?

Background of problem

Primary Question

Current Approach

Answers (1)

Related Questions