Matt
Matt

Reputation: 468

How can I export separate node and edge files from single cypher query?

I'm trying to write a single cypher query which will export to csv's: a node list and an edge list which can be analyzed by a library like igraph. I have my database set up with only a single relationship type:

(a:paper)-[:REFERENCES]->(b:paper)

I have multiple properties on each node (title, author etc.). The unique identifier is paper_id.

I've been trying to use the apoc functions apoc.export.csv.query and apoc.export.csv.data.

I can export the nodes and edges to a single file:

MATCH (n:paper)<-[r:REFERENCES]-(m:paper) WHERE n.paper_id = '1234'
WITH COLLECT(m) AS paper, COLLECT(r) AS references
CALL apoc.export.csv.data(
paper, 
references, 
'network.csv',
{}
) YIELD file, nodes, relationships
RETURN file, nodes, relationships

Or I can export just the edge list:

MATCH (n:paper)<-[r:REFERENCES]-(m:paper) WHERE n.paper_id = '1234' 
CALL apoc.export.csv.data(n.paper_id, m.paper_id, 'edge.csv', {})
WITH  n.paper_id AS From, m.paper_id AS To
; 

Ideally, I would like a single query which would produce two files.

An edge list:

    From  | To
    1234  | 4567
    1234  | 8910

And a node list:

   paper_id | title           |  author
   1234     | "a title"       | "a name"
   4567     | "another title" | "another name"
   8910     | "a third title" | "third name" 

Neo4j CE 3.4.11

Upvotes: 2

Views: 1032

Answers (1)

Pablissimo
Pablissimo

Reputation: 2905

You don't really have much control over the format of apoc.export.csv.data (or any of its related functions) - you're pretty much always going to see internal node IDs (and all the other metadata) instead of just your desired unique values.

Still - assuming you can do some futzing about on the import side you can export two files, one with edges and one with nodes.

Assuming you want enough information in the nodes.csv file to recreate the graph - i.e. you need both the papers that reference m and you need m, and using the sample Movies database:

MATCH (movie: Movie { title: 'Top Gun' })<-[acted_in: ACTED_IN]-(actor: Person)
WITH collect(distinct actor) + movie as nodes, collect(distinct acted_in) as relationships
CALL apoc.export.csv.data([], relationships, 'edges.csv', {}) YIELD file as edgefile
CALL apoc.export.csv.data(nodes, [], 'nodes.csv', {}) YIELD file as nodefile
RETURN edgefile, nodefile

This yields two files in the import folder, with contents as below. It's not clear if this actually achieves what you want, since the only consistent identifier across the two files is the internal node ID (which is sufficient to rebuild an equivalent graph).

nodes.csv

"_id","_labels","born","name","released","tagline","title","_start","_end","_type"
"31",":Person","1959","Val Kilmer","","","",,,
"34",":Person","1961","Meg Ryan","","","",,,
"33",":Person","1933","Tom Skerritt","","","",,,
"30",":Person","1957","Kelly McGillis","","","",,,
"16",":Person","1962","Tom Cruise","","","",,,
"32",":Person","1962","Anthony Edwards","","","",,,
"29",":Movie","","","Top Gun","1986","I feel the need, the need for speed.",,,

edges.csv

"_id","_labels","_start","_end","_type","roles"
,,"16","29","ACTED_IN","[""Maverick""]"
,,"30","29","ACTED_IN","[""Charlie""]"
,,"31","29","ACTED_IN","[""Iceman""]"
,,"32","29","ACTED_IN","[""Goose""]"
,,"33","29","ACTED_IN","[""Viper""]"
,,"34","29","ACTED_IN","[""Carole""]"

Upvotes: 3

Related Questions