ljuk
ljuk

Reputation: 784

Query array column in BigQuery by condition

I have a table in Bigquery with this format:

+------------+-----------------+------------+-----------------+---------------------------------+
| event_date | event_timestamp | event_name | event_params.key| event_params.value.string_value |
+------------+-----------------+------------+-----------------+---------------------------------+
| 20201110   | 2929929292      | my_event   | previous_page   | /some-page                      |
+------------+-----------------+------------+-----------------+---------------------------------+
|                                           | layer           | /some-page/layer                |
|                                           +-----------------+---------------------------------+
|                                           | session_id      | 99292                           |
|                                           +-----------------+---------------------------------+
|                                           | user._id        | 2929292                         |
+------------+-----------------+------------+-----------------+---------------------------------+
| 20201110   | 2882829292      | my_event   | previous_page   | /some-page                      |
+------------+-----------------+------------+-----------------+---------------------------------+
|                                           | layer           | /some-page/layer                |
|                                           +-----------------+---------------------------------+
|                                           | session_id      | 29292                           |
|                                           +-----------------+---------------------------------+
|                                           | user_id         | 229292                          |
+-------------------------------------------+-----------------+---------------------------------+

I want to perform a query to get all rows where event_params.value.string_value contains the regex /layer.

I have tried this:

SELECT 
    "event_params.value.string_value",  
FROM `my_project.my_dataset.my_events_20210110`,
UNNEST(event_params) AS event_param
WHERE event_param.key = 'layer' AND
REGEXP_CONTAINS(event_param.value.string_value, r'/layer')
LIMIT 100

But I'm getting this output:

+---------------------------------+
| event_params.value.string_value |
+---------------------------------+
| event_params.value.string_value |
+---------------------------------+
| event_params.value.string_value |
+---------------------------------+
| event_params.value.string_value |
+---------------------------------+
| event_params.value.string_value |
+---------------------------------+

Some ideas of what I'm doing wrong?

Upvotes: 0

Views: 463

Answers (1)

Martin Weitzmann
Martin Weitzmann

Reputation: 4746

You are selecting a string - you should select a column.

The other problem is that you're cross joining the table with its arrays - effectively bloating up the table.

Your solution is to use a subquery in the WHERE clause:

SELECT 
    *  -- Not sure what you actually need from the table ...
FROM `my_project.my_dataset.my_events_20210110`
WHERE 
  -- COUNT(*)>0 means "if you find more than zero" then return TRUE
  (SELECT COUNT(*)>0 FROM UNNEST(event_params) AS event_param 
  WHERE event_param.key = 'layer' AND
  REGEXP_CONTAINS(event_param.value.string_value, r'/layer')
  )
LIMIT 100

If you actually want the values from the array your quick solution is removing the quotes:

SELECT 
    event_params.value.string_value  
FROM `my_project.my_dataset.my_events_20210110`,
UNNEST(event_params) AS event_param
WHERE event_param.key = 'layer' AND
REGEXP_CONTAINS(event_param.value.string_value, r'/layer')
LIMIT 100

Upvotes: 1

Related Questions