Monica Heddneck
Monica Heddneck

Reputation: 3125

How to read in a column where date is represented either MM/DD/YYYY, or just YYYY in cypher for Neo4j

I have a csv, where date is represented in two possible ways within the same column, as in:

or

The second form is used by others who do data entry, when MM/DD is unknown.

I'd like to extract the month, day, and year as separate properties on a node.

I wrote:

LOAD CSV WITH HEADERS FROM
'file:///...some_csv.csv' AS line
WITH
line
MERGE (p:Person {
  subject_person_id: line.subject_person_id
})
  ON CREATE SET p.month_of_birth = SPLIT(line.date_of_birth, '/')[0]
  ON MATCH SET p.month_of_birth = SPLIT(line.date_of_birth, '/')[0]  

which works fine for the first case (MM/DD/YYYY) but falls short with just YYYY. In that second case, the month is set to YYYY since that's the 0th element.

I'm thinking, there has to be some testing for which date format exists, then conditionally doing a SET depending on the results?

I looked at the syntax for FOREACH but tried to apply it but it didn't seem to be the right approach. Does anyone have any advice? Thanks!

Upvotes: 2

Views: 87

Answers (2)

Dave Bennett
Dave Bennett

Reputation: 11216

You could conditionally set the value in a case statement. If you get a bad value then you can set it to null. Something like this...

LOAD CSV WITH HEADERS FROM
'file:///...some_csv.csv' AS line
WITH line
, case 
    when size(line.date_of_birth) = 10 then 
       SPLIT(line.date_of_birth, '/')[0]
    else
       NULL
    end as month
MERGE (p:Person { subject_person_id: line.subject_person_id})
  ON CREATE SET p.month_of_birth = month 
  ON MATCH SET p.month_of_birth = month

Upvotes: 1

stdob--
stdob--

Reputation: 29172

I think that you need to fill an array of empty elements. For example:

WITH SPLIT( 'MM/DD/YYYY', '/' ) as mdy1
     SPLIT( 'YYYY', '/'       ) as mdy2,
RETURN ( EXTRACT( x in RANGE( 0, 2 - size( mdy1 ) ) | "" ) + mdy1 ) [0] as m1,
       ( EXTRACT( x in RANGE( 0, 2 - size( mdy2 ) ) | "" ) + mdy2 ) [0] as m2

Hmm... Not bad. But it is not elegant. Ok, let's try to simplify:

WITH "MM/DD/YYYY" as mdy1, 
     "YYYY"       as mdy2   
RETURN ( ["", "", ""] + SPLIT( mdy1, "/" ) ) [ -3.. ][0] as m1,
       ( ["", "", ""] + SPLIT( mdy2, "/" ) ) [ -3.. ][0] as m2

We can still improve? You never know what you can do till you try:

WITH "MM/DD/YYYY" as mdy1, 
     "YYYY"       as mdy2    
RETURN ( SPLIT( "//" + mdy1, "/" ) ) [ -3.. ][0] as m1,
       ( SPLIT( "//" + mdy2, "/" ) ) [ -3.. ][0] as m2

Upvotes: 2

Related Questions