Ryan Garnett
Ryan Garnett

Reputation: 261

Import csv into Neo4j where multiple columns have empty cells

I am looking to import a csv where records in columns have null values (empty). There is a chance that any column could have an empty record, so ultimately the query should be able to handle any column. Here is an example of my table with and example of the expected graph output:

Group       Capability      Functionality   Status        Dev    Input         Output  
Pipeline    DataIngest      Read CSV        Complete      Ben    source.csv    masterTable
Pipeline    DataQuality     Quality score   Complete             masterTable
Analytics   Summary         Minimum score   In progress   Jill   masterTable   dataSummary  
Analytics   Summary         Maximum score   In progress          masterTable   dataSummary 
Analytics   Prediction      Future score                  
Output      Report          Bar chart       Not started   Ben    dataSummary
Output      Report          Line chart      Not started   Fred

Expected graph

The following is what I have tried, which does not produce that I was expecting. The query produces the following issues:

  1. Functionality is not its own labelled node (it is part of capability)
  2. Properties are not stored within Functionality (this may be because of issue 1)
CREATE CONSTRAINT ON (g:Group) ASSERT g.name is UNIQUE;
CREATE CONSTRAINT ON (c:Capability) ASSERT c.name is UNIQUE;
CREATE CONSTRAINT ON (f:Functionality) ASSERT f.name is UNIQUE;
LOAD CSV WITH HEADERS FROM 'file:///MasterSheet.csv' AS line

WITH line
MERGE (g:Group {name: line.Group})
MERGE (c:Capability {name: line.Capability})
FOREACH(x IN CASE WHEN line.DevStage IS NULL THEN [] ELSE [1] END |
  MERGE (f:Functionality {name: line.Functionality, status: line.Status, developer: line.Dev}))
MERGE (i:Data {name: line.Input})
MERGE (o:Data {name: line.Output})

CREATE UNIQUE (g)-[:PART_OF]->(c)
CREATE UNIQUE (c)-[:HAS]->(f)
CREATE UNIQUE (f)-[:INPUT]->(i)
CREATE UNIQUE (f)-[:OUTPUT]->(o)

Upvotes: 1

Views: 887

Answers (1)

cybersam
cybersam

Reputation: 66989

You have multiple issues.

  1. A big issue is that your Comma Separated Values (CSV) file is badly formatted. By default, as the name implies, a CSV file should use the comma character as the value delimiter. Your data file has no commas at all. Here is a properly formatted version of your file:

    Group,Capability,Functionality,Status,Dev,Input,Output
    Pipeline,DataIngest,Read CSV,Complete,Ben,source.csv,masterTable
    Pipeline,DataQuality,Quality score,Complete,,masterTable
    Analytics,Summary,Minimum score,In progress,Jill,masterTable,dataSummary
    Analytics,Summary,Maximum score,In progress,,masterTable,dataSummary
    Analytics,Prediction,Future score
    Output,Report,Bar chart,Not started,Ben,dataSummary
    Output,Report,Line chart,Not started,Fred
    

    Notice how "empty" values are represented by consecutive commas, or if they are at the end of a line they are just omitted. Values can have embedded spaces.

  2. You have a typo. DevStage should be Dev.

  3. The clauses that depend on f must not be executed unless the f node was created, and they must be executed when f is in scope. f is out of scope outside of the FOREACH clause.

  4. Recent versions of neo4j no longer support CREATE UNIQUE, and MERGE should be used instead.

Here is a query that solves issues 2 through 4, and relies on the above proper data format:

LOAD CSV WITH HEADERS FROM 'file:///MasterSheet.csv' AS line
MERGE (g:Group {name: line.Group})
MERGE (c:Capability {name: line.Capability})
MERGE (g)-[:PART_OF]->(c)
FOREACH(x IN CASE WHEN line.Dev IS NOT NULL THEN [1] END |
  MERGE (f:Functionality {name: line.Functionality, status: line.Status, developer: line.Dev})
  MERGE (c)-[:HAS]->(f)
  FOREACH(x IN CASE WHEN line.Input IS NOT NULL THEN [1] END |
    MERGE (i:Data {name: line.Input})
    MERGE (f)-[:INPUT]->(i)
  )
  FOREACH(x IN CASE WHEN line.Output IS NOT NULL THEN [1] END |
    MERGE (o:Data {name: line.Output})
    MERGE (f)-[:OUTPUT]->(o)
  )
)

NOTE: Your expected output data visualization is not possible with your sample data.

Upvotes: 2

Related Questions