Reputation: 3099
I have imported my dataset with Neo4j import-tool. The result looks like this:
IMPORT DONE in 3m 4s 715ms.
Imported:
9252082 nodes
12347926 relationships
100924808 properties
Peak memory usage: 604.47 MB
So an overall number of nodes is bigger than 9 million nodes. I have got the same result when I've counted Spark data frames rows from the csv files, which hold the data. However, when I perform this query in Neo4j I get a smaller number:
MATCH (n) return count(*)
The resulting count is: 4446119
I checked that the number of records only differs for one particular table, which is the biggest one. So in Neo4j instead of 5893886
the count for this table is 1087923
So according to the import results, it seems that all the nodes are imported but this trend can not be seen in Neo4j. What may be the reason for such behavior?
Upvotes: 1
Views: 36
Reputation: 3099
In my opinion, the problem is in member Ids. For some reason, in my dataset, multiple members have the same ids. That's why while running import with --no-duplicates
flag, these records are processed but they are not actually inserted into the database.
Upvotes: 0