Reputation: 357
I've tried to load a CSV file (25 Mb size, 150 000 rows) which contains 22 columns into a neo4j graph using py2neo flights modelization.
The cypher query is used in one query and contains nodes and relationships creation between the nodes (Airport, City, Flight and Plane). But when running the code, it takes forever even with USING PERIODIC COMMIT.
I am not sure if the cypher query I've written is optimized, and might be the source of the slowness. For 10 000 rows, it took me around 10 minutes to build the graph... Can anyone help me please ? Here is the code :
def importFromCSVtoNeo(graph):
query = '''
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///flights.csv" AS row FIELDTERMINATOR '\t'
WITH row
MERGE (c_departure:City {cityName: row.cityName_departure})
MERGE (a_departure:Airport {airportName: row.airportName_departure})
MERGE (f_segment1:Flight {airline: row.airline1})
ON CREATE SET f_segment1.class = row.class1,
f_segment1.outboundclassgroup = row.outboundclassgroup1
MERGE (a_departure)-[:IN]->(c_departure)
MERGE (c_departure)-[:HAS]->(a_departure)
MERGE (f_segment1)-[:FROM {departAt: row.outbounddeparttime}]->(a_departure)
MERGE (c_transfer:City {cityName: row.transferCityName})
MERGE (a_transfer:Airport {airportName: row.airportName_transfer})
MERGE (f_segment1)-[:TO_TRANSFER {transferArriveAt: row.transferArriveAt}]->(a_transfer)
MERGE (a_transfer)-[:IN]->(c_transfer)
MERGE (c_transfer)-[:HAS]->(a_transfer)
MERGE (c_arrival:City {cityName: row.cityName_arrival})
MERGE (a_arrival:Airport {airportName: row.airportName_arrival})
MERGE (f_segment2:Flight {airline: row.airline2})
ON CREATE SET f_segment2.class = row.class2,
f_segment2.outboundclassgroup = row.outboundclassgroup2
MERGE (f_segment2)-[:TO {arrivalAt: row.outboundarrivaltime}]->(a_arrival)
MERGE (f_segment2)-[:FROM_TRANSFER {transferDepartAt: row.transferDepartAt}]->(a_transfer)
MERGE (a_arrival)-[:IN]->(c_arrival)
MERGE (c_arrival)-[:HAS]->(a_arrival)
MERGE (p:Plane {saleprice: row.saleprice})
ON CREATE SET p.depart = row.cityName_departure,
p.destination = row.cityName_arrival,
p.salechannel = row.salechannel,
p.planeDuration = row.planeDuration
MERGE (p)-[:HAS_FLIGHTS]->(f_segment1)
MERGE (f_segment1)-[:WAIT_FOR {waitingTime: row.waitingTime}]->(f_segment2)
'''
graph.run(query)
if __name__ == '__main__':
graph = Graph()
importFromCSVtoNeo(graph)
I've also tried to do it in a batch mode but the performance doesn't get better... I'll appreciated any comments or suggestion. Thanks !!
Upvotes: 0
Views: 742
Reputation: 21542
I would use indices on nodes properties before launching the script, in order to let neo4j using them for fast look-up when using MERGE (since it has to MATCH nodes row by row). For instance, for the first node property I would use:
CREATE INDEX ON :City(cityname)
and so on. You can create them directly within py2neo into single run statements.
Upvotes: 1