Neo4j Import-tool nodes count inconsistencies

Question

I have imported my dataset with Neo4j import-tool. The result looks like this:

IMPORT DONE in 3m 4s 715ms. 
Imported:
  9252082 nodes
  12347926 relationships
  100924808 properties
Peak memory usage: 604.47 MB

So an overall number of nodes is bigger than 9 million nodes. I have got the same result when I've counted Spark data frames rows from the csv files, which hold the data. However, when I perform this query in Neo4j I get a smaller number:

MATCH (n) return count(*)

The resulting count is: 4446119

I checked that the number of records only differs for one particular table, which is the biggest one. So in Neo4j instead of 5893886 the count for this table is 1087923

So according to the import results, it seems that all the nodes are imported but this trend can not be seen in Neo4j. What may be the reason for such behavior?

Cassie · Accepted Answer

In my opinion, the problem is in member Ids. For some reason, in my dataset, multiple members have the same ids. That's why while running import with --no-duplicates flag, these records are processed but they are not actually inserted into the database.

Neo4j Import-tool nodes count inconsistencies

Answers (1)

Related Questions