Reputation: 261
I am looking to import a csv where records in columns have null values (empty). There is a chance that any column could have an empty record, so ultimately the query should be able to handle any column. Here is an example of my table with and example of the expected graph output:
Group Capability Functionality Status Dev Input Output
Pipeline DataIngest Read CSV Complete Ben source.csv masterTable
Pipeline DataQuality Quality score Complete masterTable
Analytics Summary Minimum score In progress Jill masterTable dataSummary
Analytics Summary Maximum score In progress masterTable dataSummary
Analytics Prediction Future score
Output Report Bar chart Not started Ben dataSummary
Output Report Line chart Not started Fred
The following is what I have tried, which does not produce that I was expecting. The query produces the following issues:
CREATE CONSTRAINT ON (g:Group) ASSERT g.name is UNIQUE; CREATE CONSTRAINT ON (c:Capability) ASSERT c.name is UNIQUE; CREATE CONSTRAINT ON (f:Functionality) ASSERT f.name is UNIQUE; LOAD CSV WITH HEADERS FROM 'file:///MasterSheet.csv' AS line WITH line MERGE (g:Group {name: line.Group}) MERGE (c:Capability {name: line.Capability}) FOREACH(x IN CASE WHEN line.DevStage IS NULL THEN [] ELSE [1] END | MERGE (f:Functionality {name: line.Functionality, status: line.Status, developer: line.Dev})) MERGE (i:Data {name: line.Input}) MERGE (o:Data {name: line.Output}) CREATE UNIQUE (g)-[:PART_OF]->(c) CREATE UNIQUE (c)-[:HAS]->(f) CREATE UNIQUE (f)-[:INPUT]->(i) CREATE UNIQUE (f)-[:OUTPUT]->(o)
Upvotes: 1
Views: 887
Reputation: 66989
You have multiple issues.
A big issue is that your Comma Separated Values (CSV) file is badly formatted. By default, as the name implies, a CSV file should use the comma character as the value delimiter. Your data file has no commas at all. Here is a properly formatted version of your file:
Group,Capability,Functionality,Status,Dev,Input,Output
Pipeline,DataIngest,Read CSV,Complete,Ben,source.csv,masterTable
Pipeline,DataQuality,Quality score,Complete,,masterTable
Analytics,Summary,Minimum score,In progress,Jill,masterTable,dataSummary
Analytics,Summary,Maximum score,In progress,,masterTable,dataSummary
Analytics,Prediction,Future score
Output,Report,Bar chart,Not started,Ben,dataSummary
Output,Report,Line chart,Not started,Fred
Notice how "empty" values are represented by consecutive commas, or if they are at the end of a line they are just omitted. Values can have embedded spaces.
You have a typo. DevStage
should be Dev
.
The clauses that depend on f
must not be executed unless the f
node was created, and they must be executed when f
is in scope. f
is out of scope outside of the FOREACH
clause.
Recent versions of neo4j no longer support CREATE UNIQUE
, and MERGE
should be used instead.
Here is a query that solves issues 2 through 4, and relies on the above proper data format:
LOAD CSV WITH HEADERS FROM 'file:///MasterSheet.csv' AS line
MERGE (g:Group {name: line.Group})
MERGE (c:Capability {name: line.Capability})
MERGE (g)-[:PART_OF]->(c)
FOREACH(x IN CASE WHEN line.Dev IS NOT NULL THEN [1] END |
MERGE (f:Functionality {name: line.Functionality, status: line.Status, developer: line.Dev})
MERGE (c)-[:HAS]->(f)
FOREACH(x IN CASE WHEN line.Input IS NOT NULL THEN [1] END |
MERGE (i:Data {name: line.Input})
MERGE (f)-[:INPUT]->(i)
)
FOREACH(x IN CASE WHEN line.Output IS NOT NULL THEN [1] END |
MERGE (o:Data {name: line.Output})
MERGE (f)-[:OUTPUT]->(o)
)
)
NOTE: Your expected output data visualization is not possible with your sample data.
Upvotes: 2