Seth Singson-Robbins
Seth Singson-Robbins

Reputation: 21

Snowflake/SQL Date Time - Weird Format

I have a dataset that has a long format and I need to update the data to a normal date type in Snowflake. The data in the () looks to change slightly too. Does anybody know how update the string with two examples below to a date type?

Upvotes: 2

Views: 565

Answers (2)

Eric Lin
Eric Lin

Reputation: 1510

Using regular expressions:

SELECT 
    column1
    ,TRY_TO_TIMESTAMP_TZ(
        regexp_replace(column1, '(.*) GMT([^(]*) \(.*\)', '\\1 \\2'), 
        'DY Mon DD YYYY HH24:MI:SS TZHTZM'
    ) as timestamp
FROM VALUES 
     ('Sat May 16 2020 11:12:33 GMT-0400 (Eastern Daylight Time)')
    ,('Fri Jan 22 2021 10:43:33 GMT-0500 (EST)')
    ,('Wed Nov 25 2020 19:56:46 GMT+0400 (+04)')
    ,('Sat Aug 19 2017 01:23:39 GMT+0530 (India Standard Time)')
;

+-----------------------------------------------------------+-------------------------------+
| COLUMN1                                                   | TIMESTAMP                     |
|-----------------------------------------------------------+-------------------------------|
| Sat May 16 2020 11:12:33 GMT-0400 (Eastern Daylight Time) | 2020-05-16 11:12:33.000 -0400 |
| Fri Jan 22 2021 10:43:33 GMT-0500 (EST)                   | 2021-01-22 10:43:33.000 -0500 |
| Wed Nov 25 2020 19:56:46 GMT+0400 (+04)                   | 2020-11-25 19:56:46.000 +0400 |
| Sat Aug 19 2017 01:23:39 GMT+0530 (India Standard Time)   | 2017-08-19 01:23:39.000 +0530 |
+-----------------------------------------------------------+-------------------------------+

Upvotes: 0

Simeon Pilgrim
Simeon Pilgrim

Reputation: 25903

So this decomposed set of steps shows one way to handle these:

SELECT 
    column1
    ,split(column1, ' ') as parts
    ,ARRAY_SLICE(parts,1,6) as sub_parts
    ,ARRAY_TO_STRING(sub_parts, ' ') as new_str
    ,TRY_TO_TIMESTAMP_TZ(new_str, 'Mon DD YYYY HH24:MI:SS GMTTZHTZM')
FROM VALUES 
     ('Sat May 16 2020 11:12:33 GMT-0400 (Eastern Daylight Time)')
    ,('Fri Jan 22 2021 10:43:33 GMT-0500 (EST)')
    ,('Wed Nov 25 2020 19:56:46 GMT+0400 (+04)')
    ,('Sat Aug 19 2017 01:23:39 GMT+0530 (India Standard Time)')
;

gives:

COLUMN1 PARTS SUB_PARTS NEW_STR ANSWER
Sat May 16 2020 11:12:33 GMT-0400 (Eastern Daylight Time) [ "Sat", "May", "16", "2020", "11:12:33", "GMT-0400", "(Eastern", "Daylight", "Time)" ] [ "May", "16", "2020", "11:12:33", "GMT-0400" ] May 16 2020 11:12:33 GMT-0400 2020-05-16 11:12:33.000 -0400
Fri Jan 22 2021 10:43:33 GMT-0500 (EST) [ "Fri", "Jan", "22", "2021", "10:43:33", "GMT-0500", "(EST)" ] [ "Jan", "22", "2021", "10:43:33", "GMT-0500" ] Jan 22 2021 10:43:33 GMT-0500 2021-01-22 10:43:33.000 -0500
Wed Nov 25 2020 19:56:46 GMT+0400 (+04) [ "Wed", "Nov", "25", "2020", "19:56:46", "GMT+0400", "(+04)" ] [ "Nov", "25", "2020", "19:56:46", "GMT+0400" ] Nov 25 2020 19:56:46 GMT+0400 2020-11-25 19:56:46.000 +0400
Sat Aug 19 2017 01:23:39 GMT+0530 (India Standard Time) [ "Sat", "Aug", "19", "2017", "01:23:39", "GMT+0530", "(India", "Standard", "Time)" ] [ "Aug", "19", "2017", "01:23:39", "GMT+0530" ] Aug 19 2017 01:23:39 GMT+0530 2017-08-19 01:23:39.000 +0530

smashed all together:

SELECT 
    column1
    ,TRY_TO_TIMESTAMP_TZ(ARRAY_TO_STRING(ARRAY_SLICE(split(column1, ' '),1,6), ' '), 'Mon DD YYYY HH24:MI:SS GMTTZHTZM') as answer
FROM VALUES 
     ('Sat May 16 2020 11:12:33 GMT-0400 (Eastern Daylight Time)')
    ,('Fri Jan 22 2021 10:43:33 GMT-0500 (EST)')
    ,('Wed Nov 25 2020 19:56:46 GMT+0400 (+04)')
    ,('Sat Aug 19 2017 01:23:39 GMT+0530 (India Standard Time)')
;

gives:

COLUMN1 ANSWER
Sat May 16 2020 11:12:33 GMT-0400 (Eastern Daylight Time) 2020-05-16 11:12:33.000 -0400
Fri Jan 22 2021 10:43:33 GMT-0500 (EST) 2021-01-22 10:43:33.000 -0500
Wed Nov 25 2020 19:56:46 GMT+0400 (+04) 2020-11-25 19:56:46.000 +0400
Sat Aug 19 2017 01:23:39 GMT+0530 (India Standard Time) 2017-08-19 01:23:39.000 +0530

Now in the sub_parts step I skip the day of the week, and stop before the timezone, the former can be handled but the latter cannot.

wait a second, the input is constant width, thus we can constant width SUBSTR stepping:

,TRY_TO_TIMESTAMP_TZ(substr(column1, 5, 29), 'Mon DD YYYY HH24:MI:SS GMTTZHTZM') as answer  

Upvotes: 1

Related Questions