jamiet
jamiet

Reputation: 12264

Extracting an element from a cell that contains a dictionary

I have the value category:ops,client:acompany,type:sometype which as you can see is effectively a dictionary, I would like to extract the value for the dictionary key client, in other words I want to extract acompany.

Here's how I've done it:

select CASE WHEN INSTR(client_step1, ",") > 0 THEN SUBSTR(client_step1, 0, INSTR(client_step1, ",") - 1) 
            ELSE client_step1 
       END AS client
from (
    select CASE WHEN INSTR(dict, "client") > 0 THEN SUBSTR(dict, INSTR(dict, "client") + 7) 
                ELSE CAST(NULL as STRING) 
           END  as client_step1
    from (
        select "category:ops,client:acompany,type:sometype" as dict
        )
    )

enter image description here

but that seems rather verbose (and frankly, slicing up strings with a combination of INSTR(), SUBSTR() and derived tables feels a bit meh). I'm wondering if there's a better way to do it that I don't know about (I'm fairly new to bq).

thanks in advance

Upvotes: 1

Views: 3923

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Another option to parse dictionary like yours is as below (for BigQuery Standard SQL)

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 1 id, "category:ops,client:acompany,type:sometype" AS dict
)
SELECT id,
  ARRAY(
    SELECT AS STRUCT 
      SPLIT(x, ':')[OFFSET(0)] key, 
      SPLIT(x, ':')[OFFSET(1)] value
    FROM UNNEST(SPLIT(dict)) x 
  ) items
FROM `project.dataset.table`   

with result as below

Row id  items.key   items.value  
1   1   category    ops  
        client      acompany     
        type        sometype       

As you can see here - you parse out all dictionary items

If you still need to only specific element's value - you can use below

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 1 id, "category:ops,client:acompany,type:sometype" AS dict
)
SELECT id,
  ( SELECT 
      SPLIT(x, ':')[OFFSET(1)]
    FROM UNNEST(SPLIT(dict)) x 
    WHERE SPLIT(x, ':')[OFFSET(0)] = 'client'
    LIMIT 1
  ) client
FROM `project.dataset.table`  

Upvotes: 1

Elliott Brossard
Elliott Brossard

Reputation: 33745

It sounds like you want the REGEXP_EXTRACT function. Here is an example:

SELECT REGEXP_EXTRACT(dict, r'client:([^,:]+)') AS client_step1
FROM (
  SELECT "category:ops,client:acompany,type:sometype" AS dict
)

This returns the string acompany as its result. The regexp looks for client: inside the string, and matches everything after it up until the next , or : or of the end of the string.

Upvotes: 2

Related Questions