Shivi Bhatia
Shivi Bhatia

Reputation: 189

Extracting string of dates in a big query

I have used the below code

select --split( INDIVIDUAL_NAMES, '->') as Assigned_to_individual,  

regexp_extract( split( INDIVIDUAL_NAMES, '->'),r'(\d\d/\d\d/\d\d\d\d \d\d:\d\d:\d\d)') from [table name ]where _ID  in('071632')  

Here we need to extract date and i have given it a try by using d so it gives me numeric fields but this is not giving me complete date.

Upvotes: 0

Views: 1905

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173171

Below is for BigQuery Standard SQL

#standardSQL
WITH yourTable AS ( 
  SELECT 1 AS id, 'Start -> (11/11/2016 08:24:24 AM) -> Akumar (11/11/2016 11:15:33 AM) -> Akumar (01/06/2017 08:08:44 PM) -> Akumar (01/30/2017 10:34:33 AM) -> Akumar (03/15/2017 02:10:12 PM) -> Akumar (03/23/2017 12:42:52 PM) -> Akumar (06/20/2017 12:52:27 PM) -> (06/27/2017 05:30:48 PM) -> Sneha Singh (06/28/2017 03:11:34 AM)' AS INDIVIDUAL_NAMES
) 
SELECT 
  id, 
  LTRIM(SPLIT(CONCAT(' ', item), ' (')[OFFSET(0)]) AS name, 
  REGEXP_EXTRACT(item, r'(\d\d/\d\d/\d\d\d\d \d\d:\d\d:\d\d [AP]M)') AS date_as_string, 
  PARSE_DATETIME('%m/%d/%Y %r', REGEXP_EXTRACT(item, r'(\d\d/\d\d/\d\d\d\d \d\d:\d\d:\d\d [AP]M)')) AS date_as_datetime,
  PARSE_TIMESTAMP('%m/%d/%Y %r', REGEXP_EXTRACT(item, r'(\d\d/\d\d/\d\d\d\d \d\d:\d\d:\d\d [AP]M)')) AS date_as_timestamp
FROM yourTable, UNNEST(SPLIT(INDIVIDUAL_NAMES, ' -> ')) AS item 
-- ORDER BY 3 

above can be slightly simplified to remove redundant portions - for example as

#standardSQL
WITH yourTable AS ( 
  SELECT 1 AS id, 'Start -> (11/11/2016 08:24:24 AM) -> Akumar (11/11/2016 11:15:33 AM) -> Akumar (01/06/2017 08:08:44 PM) -> Akumar (01/30/2017 10:34:33 AM) -> Akumar (03/15/2017 02:10:12 PM) -> Akumar (03/23/2017 12:42:52 PM) -> Akumar (06/20/2017 12:52:27 PM) -> (06/27/2017 05:30:48 PM) -> Sneha Singh (06/28/2017 03:11:34 AM)' AS INDIVIDUAL_NAMES
) 
SELECT 
  id, 
  LTRIM(SPLIT(CONCAT(' ', item), ' (')[OFFSET(0)]) AS name, 
  date_as_string,
  PARSE_DATETIME('%m/%d/%Y %r', date_as_string) AS date_as_datetime,
  PARSE_TIMESTAMP('%m/%d/%Y %r', date_as_string) AS date_as_timestamp
FROM yourTable, 
  UNNEST(SPLIT(INDIVIDUAL_NAMES, ' -> ')) AS item, 
  UNNEST([REGEXP_EXTRACT(item, r'(\d\d/\d\d/\d\d\d\d \d\d:\d\d:\d\d [AP]M)')]) AS date_as_string 
-- ORDER BY 3   

Output is as below

id  name        date_as_string          date_as_datetime    date_as_timestamp    
1   Start       null                    null                null     
1   Akumar      01/06/2017 08:08:44 PM  2017-01-06T20:08:44 2017-01-06 20:08:44 UTC
1   Akumar      01/30/2017 10:34:33 AM  2017-01-30T10:34:33 2017-01-30 10:34:33 UTC
1   Akumar      03/15/2017 02:10:12 PM  2017-03-15T14:10:12 2017-03-15 14:10:12 UTC
1   Akumar      03/23/2017 12:42:52 PM  2017-03-23T12:42:52 2017-03-23 12:42:52 UTC
1   Akumar      06/20/2017 12:52:27 PM  2017-06-20T12:52:27 2017-06-20 12:52:27 UTC
1               06/27/2017 05:30:48 PM  2017-06-27T17:30:48 2017-06-27 17:30:48 UTC
1   Sneha Singh 06/28/2017 03:11:34 AM  2017-06-28T03:11:34 2017-06-28 03:11:34 UTC
1               11/11/2016 08:24:24 AM  2016-11-11T08:24:24 2016-11-11 08:24:24 UTC
1   Akumar      11/11/2016 11:15:33 AM  2016-11-11T11:15:33 2016-11-11 11:15:33 UTC  

would be really great if you can suggest a similar instance in legacy sql

You can try below for BigQuery Legacy SQL

#legacySQL
SELECT
  id,
  IF(name = CONCAT('(', date_as_string, ')'), '', name) AS name,
  date_as_string
FROM (
  SELECT 
    id, 
    NTH(1, SPLIT(item, ' (')) AS name,
    REGEXP_EXTRACT(item, r'(\d\d/\d\d/\d\d\d\d \d\d:\d\d:\d\d [AP]M)') AS date_as_string
  FROM FLATTEN((
    SELECT id, SPLIT(INDIVIDUAL_NAMES, ' -> ') AS item, INDIVIDUAL_NAMES
    FROM yourTable 
  ), item)
)
-- ORDER BY date_as_string 

You can test it with dummy data you provided as below

#legacySQL
SELECT
  id,
  IF(name = CONCAT('(', date_as_string, ')'), '', name) AS name,
  date_as_string
FROM (
  SELECT 
    id, 
    NTH(1, SPLIT(item, ' (')) AS name,
    REGEXP_EXTRACT(item, r'(\d\d/\d\d/\d\d\d\d \d\d:\d\d:\d\d [AP]M)') AS date_as_string
  FROM FLATTEN((
    SELECT id, SPLIT(INDIVIDUAL_NAMES, ' -> ') AS item, INDIVIDUAL_NAMES
    FROM (
      SELECT 1 AS id, 'Start -> (11/11/2016 08:24:24 AM) -> Akumar (11/11/2016 11:15:33 AM) -> Akumar (01/06/2017 08:08:44 PM) -> Akumar (01/30/2017 10:34:33 AM) -> Akumar (03/15/2017 02:10:12 PM) -> Akumar (03/23/2017 12:42:52 PM) -> Akumar (06/20/2017 12:52:27 PM) -> (06/27/2017 05:30:48 PM) -> Sneha Singh (06/28/2017 03:11:34 AM)' AS INDIVIDUAL_NAMES
    ) AS yourTable 
  ), item)
)
ORDER BY date_as_string 

Upvotes: 1

Omid Fatemieh
Omid Fatemieh

Reputation: 66

Since I do not have access to your table, I used WITH clause to emulate your input table, using the provided input sample. With that, the following query produces the desired output dates. Note that this is a StandardSQL query:

#standardSQL
WITH
  input AS (
  SELECT
    "Start -> (11/11/2016 08:24:24 AM) -> Akumar (11/11/2016 11:15:33 AM) -> Akumar (01/06/2017 08:08:44 PM) -> Akumar (01/30/2017 10:34:33 AM) -> Akumar (03/15/2017 02:10:12 PM) -> Akumar (03/23/2017 12:42:52 PM) -> Akumar (06/20/2017 12:52:27 PM) -> (06/27/2017 05:30:48 PM) -> Sneha Singh (06/28/2017 03:11:34 AM)" AS input_group),
  split_output AS (
  SELECT
    SPLIT(input_group, "->") AS output
  FROM
    input)
SELECT
  REGEXP_EXTRACT(output,r"(\d\d/\d\d/\d\d\d\d \d\d:\d\d:\d\d)") AS date
FROM
  split_output so,
  so.output

Output (null is for the "Start" entry - note that sql output is not ordered):

Row date     
1   03/15/2017 02:10:12  
2   01/06/2017 08:08:44  
3   11/11/2016 08:24:24  
4   06/20/2017 12:52:27  
5   06/28/2017 03:11:34  
6   01/30/2017 10:34:33  
7   06/27/2017 05:30:48  
8   null     
9   11/11/2016 11:15:33  
10  03/23/2017 12:42:52

Upvotes: 1

Related Questions