Reputation: 3125
I have a csv, where date is represented in two possible ways within the same column, as in:
or
The second form is used by others who do data entry, when MM/DD is unknown.
I'd like to extract the month, day, and year as separate properties on a node.
I wrote:
LOAD CSV WITH HEADERS FROM
'file:///...some_csv.csv' AS line
WITH
line
MERGE (p:Person {
subject_person_id: line.subject_person_id
})
ON CREATE SET p.month_of_birth = SPLIT(line.date_of_birth, '/')[0]
ON MATCH SET p.month_of_birth = SPLIT(line.date_of_birth, '/')[0]
which works fine for the first case (MM/DD/YYYY) but falls short with just YYYY. In that second case, the month is set to YYYY since that's the 0th element.
I'm thinking, there has to be some testing for which date format exists, then conditionally doing a SET depending on the results?
I looked at the syntax for FOREACH
but tried to apply it but it didn't seem to be the right approach. Does anyone have any advice? Thanks!
Upvotes: 2
Views: 87
Reputation: 11216
You could conditionally set the value in a case statement. If you get a bad value then you can set it to null
. Something like this...
LOAD CSV WITH HEADERS FROM
'file:///...some_csv.csv' AS line
WITH line
, case
when size(line.date_of_birth) = 10 then
SPLIT(line.date_of_birth, '/')[0]
else
NULL
end as month
MERGE (p:Person { subject_person_id: line.subject_person_id})
ON CREATE SET p.month_of_birth = month
ON MATCH SET p.month_of_birth = month
Upvotes: 1
Reputation: 29172
I think that you need to fill an array of empty elements. For example:
WITH SPLIT( 'MM/DD/YYYY', '/' ) as mdy1
SPLIT( 'YYYY', '/' ) as mdy2,
RETURN ( EXTRACT( x in RANGE( 0, 2 - size( mdy1 ) ) | "" ) + mdy1 ) [0] as m1,
( EXTRACT( x in RANGE( 0, 2 - size( mdy2 ) ) | "" ) + mdy2 ) [0] as m2
Hmm... Not bad. But it is not elegant. Ok, let's try to simplify:
WITH "MM/DD/YYYY" as mdy1,
"YYYY" as mdy2
RETURN ( ["", "", ""] + SPLIT( mdy1, "/" ) ) [ -3.. ][0] as m1,
( ["", "", ""] + SPLIT( mdy2, "/" ) ) [ -3.. ][0] as m2
We can still improve? You never know what you can do till you try:
WITH "MM/DD/YYYY" as mdy1,
"YYYY" as mdy2
RETURN ( SPLIT( "//" + mdy1, "/" ) ) [ -3.. ][0] as m1,
( SPLIT( "//" + mdy2, "/" ) ) [ -3.. ][0] as m2
Upvotes: 2