Reputation: 21
I have a dataset that has a long format and I need to update the data to a normal date type in Snowflake. The data in the () looks to change slightly too. Does anybody know how update the string with two examples below to a date type?
Upvotes: 2
Views: 565
Reputation: 1510
Using regular expressions:
SELECT
column1
,TRY_TO_TIMESTAMP_TZ(
regexp_replace(column1, '(.*) GMT([^(]*) \(.*\)', '\\1 \\2'),
'DY Mon DD YYYY HH24:MI:SS TZHTZM'
) as timestamp
FROM VALUES
('Sat May 16 2020 11:12:33 GMT-0400 (Eastern Daylight Time)')
,('Fri Jan 22 2021 10:43:33 GMT-0500 (EST)')
,('Wed Nov 25 2020 19:56:46 GMT+0400 (+04)')
,('Sat Aug 19 2017 01:23:39 GMT+0530 (India Standard Time)')
;
+-----------------------------------------------------------+-------------------------------+
| COLUMN1 | TIMESTAMP |
|-----------------------------------------------------------+-------------------------------|
| Sat May 16 2020 11:12:33 GMT-0400 (Eastern Daylight Time) | 2020-05-16 11:12:33.000 -0400 |
| Fri Jan 22 2021 10:43:33 GMT-0500 (EST) | 2021-01-22 10:43:33.000 -0500 |
| Wed Nov 25 2020 19:56:46 GMT+0400 (+04) | 2020-11-25 19:56:46.000 +0400 |
| Sat Aug 19 2017 01:23:39 GMT+0530 (India Standard Time) | 2017-08-19 01:23:39.000 +0530 |
+-----------------------------------------------------------+-------------------------------+
Upvotes: 0
Reputation: 25903
So this decomposed set of steps shows one way to handle these:
SELECT
column1
,split(column1, ' ') as parts
,ARRAY_SLICE(parts,1,6) as sub_parts
,ARRAY_TO_STRING(sub_parts, ' ') as new_str
,TRY_TO_TIMESTAMP_TZ(new_str, 'Mon DD YYYY HH24:MI:SS GMTTZHTZM')
FROM VALUES
('Sat May 16 2020 11:12:33 GMT-0400 (Eastern Daylight Time)')
,('Fri Jan 22 2021 10:43:33 GMT-0500 (EST)')
,('Wed Nov 25 2020 19:56:46 GMT+0400 (+04)')
,('Sat Aug 19 2017 01:23:39 GMT+0530 (India Standard Time)')
;
gives:
COLUMN1 | PARTS | SUB_PARTS | NEW_STR | ANSWER |
---|---|---|---|---|
Sat May 16 2020 11:12:33 GMT-0400 (Eastern Daylight Time) | [ "Sat", "May", "16", "2020", "11:12:33", "GMT-0400", "(Eastern", "Daylight", "Time)" ] | [ "May", "16", "2020", "11:12:33", "GMT-0400" ] | May 16 2020 11:12:33 GMT-0400 | 2020-05-16 11:12:33.000 -0400 |
Fri Jan 22 2021 10:43:33 GMT-0500 (EST) | [ "Fri", "Jan", "22", "2021", "10:43:33", "GMT-0500", "(EST)" ] | [ "Jan", "22", "2021", "10:43:33", "GMT-0500" ] | Jan 22 2021 10:43:33 GMT-0500 | 2021-01-22 10:43:33.000 -0500 |
Wed Nov 25 2020 19:56:46 GMT+0400 (+04) | [ "Wed", "Nov", "25", "2020", "19:56:46", "GMT+0400", "(+04)" ] | [ "Nov", "25", "2020", "19:56:46", "GMT+0400" ] | Nov 25 2020 19:56:46 GMT+0400 | 2020-11-25 19:56:46.000 +0400 |
Sat Aug 19 2017 01:23:39 GMT+0530 (India Standard Time) | [ "Sat", "Aug", "19", "2017", "01:23:39", "GMT+0530", "(India", "Standard", "Time)" ] | [ "Aug", "19", "2017", "01:23:39", "GMT+0530" ] | Aug 19 2017 01:23:39 GMT+0530 | 2017-08-19 01:23:39.000 +0530 |
smashed all together:
SELECT
column1
,TRY_TO_TIMESTAMP_TZ(ARRAY_TO_STRING(ARRAY_SLICE(split(column1, ' '),1,6), ' '), 'Mon DD YYYY HH24:MI:SS GMTTZHTZM') as answer
FROM VALUES
('Sat May 16 2020 11:12:33 GMT-0400 (Eastern Daylight Time)')
,('Fri Jan 22 2021 10:43:33 GMT-0500 (EST)')
,('Wed Nov 25 2020 19:56:46 GMT+0400 (+04)')
,('Sat Aug 19 2017 01:23:39 GMT+0530 (India Standard Time)')
;
gives:
COLUMN1 | ANSWER |
---|---|
Sat May 16 2020 11:12:33 GMT-0400 (Eastern Daylight Time) | 2020-05-16 11:12:33.000 -0400 |
Fri Jan 22 2021 10:43:33 GMT-0500 (EST) | 2021-01-22 10:43:33.000 -0500 |
Wed Nov 25 2020 19:56:46 GMT+0400 (+04) | 2020-11-25 19:56:46.000 +0400 |
Sat Aug 19 2017 01:23:39 GMT+0530 (India Standard Time) | 2017-08-19 01:23:39.000 +0530 |
Now in the sub_parts
step I skip the day of the week, and stop before the timezone, the former can be handled but the latter cannot.
wait a second, the input is constant width, thus we can constant width SUBSTR stepping:
,TRY_TO_TIMESTAMP_TZ(substr(column1, 5, 29), 'Mon DD YYYY HH24:MI:SS GMTTZHTZM') as answer
Upvotes: 1