Reputation: 924
I'm working on a project that involves importing data into Memgraph, and I have a dataset in CSV format. What is the most efficient way to import this CSV file into Memgraph? Are there any best practices or tips I should be aware of while doing this?
Upvotes: 0
Views: 533
Reputation: 924
Here is an example query to import data from a CSV file with a header:
LOAD CSV FROM "/path/to/your/data.csv" WITH HEADER AS row
CREATE (p:Person) SET p += row;
When you have data in a CSV format and want to import it into Memgraph, the LOAD CSV
Cypher clause is an efficient solution. This clause reads rows from a specified CSV file and either populates an empty database or appends new data to an existing dataset. The Excel CSV dialect is supported by Memgraph as it is the most prevalent.
Please note that the LOAD CSV
clause cannot be used with Memgraph Cloud instances since currently, there is no method for making files accessible by Memgraph.
Here is the syntax for the LOAD CSV
clause:
LOAD CSV FROM <csv-file-path> ( WITH | NO ) HEADER [IGNORE BAD] [DELIMITER <delimiter-string>] [QUOTE <quote-string>] AS <variable-name>
Replace <csv-file-path>
with the path to your CSV file. If you are running Memgraph with Docker, you will need to copy the files from your local directory into the Docker container where Memgraph can access them.
Choose either ( WITH | NO ) HEADER
to specify whether your CSV file has a header. If you opt for WITH HEADER
, the first line will be parsed as a header, and the remaining rows will be parsed as regular rows. If you select NO HEADER
, each row is parsed as a list of values, and there are no restrictions on the number of values a row may contain.
Use the IGNORE BAD
flag to determine whether rows with errors should be ignored. If set, the parser will attempt to return the first valid row from the CSV file. If not set, an exception will be thrown for the first invalid row encountered.
The DELIMITER <delimiter-string>
option allows you to specify the CSV delimiter character, with the default being ,
. The QUOTE <quote-string>
option enables you to specify the CSV quote character, with the default being "
. Finally, replace <variable-name>
with a symbolic name representing the variable to which the contents of the parsed row will be bound.
It is important to note that the LOAD CSV
clause is not a standalone clause, meaning that it must be accompanied by at least one other clause in a valid query. Moreover, the LOAD CSV
clause can only be used once per query.
To speed up the data import process, consider creating indexes on nodes or node properties once they are imported:
CREATE INDEX ON Node(id);
Upvotes: 0