AdilK
AdilK

Reputation: 707

BigQuery - Same query, REGEXP_CONTAINS processed 3X less data than IN Operator? Details inside

I'm trying to extract data where the SKU matches either of two values."GGOEGGCX056299|GGOEGAAX0104"

When I run the REGEXP_CONTAINS version, it uses 3X less space from my query quota [17.6 MB vs 51.5 MB using IN operator]. My Regex version is also set to search for specific SKU's via the pipe symbol so I'm wondering what caused the REGEX version to use less space in processing the query compared to the IN operator that also searched for two specific SKU's?

Any help with understanding the difference and how can I make my queries more efficient? Thanks.

SELECT
date, 
prod.productSKU AS SKU,
SUM(prod.productQuantity) AS qty_purchased
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`, 
UNNEST (hits) hit, UNNEST(product) prod
WHERE _TABLE_SUFFIX BETWEEN '20170101' AND '20170131'
AND
REGEXP_CONTAINS (prod.productSKU,"GGOEGGCX056299|GGOEGAAX0104")
GROUP BY date, SKU
ORDER BY date ASC

When I run the IN version to pull the same data, it says used 51.5 MB

SELECT
date, 
prod.productSKU AS SKU,
SUM(prod.productQuantity) AS qty_purchased
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`, 
UNNEST (hits) hit, UNNEST(product) prod
WHERE _TABLE_SUFFIX BETWEEN '20170101' AND '20170331' 
AND
prod.productSKU IN ("GGOEGGCX056299", "GGOEGAAX0104")
GROUP BY date, SKU
ORDER BY date ASC

enter image description here enter image description here

Upvotes: 0

Views: 502

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172994

it uses 3X less space from my query quota [17.6 MB vs 51.5 MB]

Below is why!!!

in first query you have

WHERE _TABLE_SUFFIX BETWEEN '20170101' AND '20170131'

while in second

WHERE _TABLE_SUFFIX BETWEEN '20170101' AND '20170331' 

Obviously , second query covering more tables thus difference in bytes - one month vs three months - thus ~3x difference

Upvotes: 1

Related Questions