Reputation: 707
I'm trying to extract data where the SKU matches either of two values."GGOEGGCX056299|GGOEGAAX0104"
When I run the REGEXP_CONTAINS version, it uses 3X less space from my query quota [17.6 MB vs 51.5 MB using IN operator]. My Regex version is also set to search for specific SKU's via the pipe symbol so I'm wondering what caused the REGEX version to use less space in processing the query compared to the IN operator that also searched for two specific SKU's?
Any help with understanding the difference and how can I make my queries more efficient? Thanks.
SELECT
date,
prod.productSKU AS SKU,
SUM(prod.productQuantity) AS qty_purchased
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST (hits) hit, UNNEST(product) prod
WHERE _TABLE_SUFFIX BETWEEN '20170101' AND '20170131'
AND
REGEXP_CONTAINS (prod.productSKU,"GGOEGGCX056299|GGOEGAAX0104")
GROUP BY date, SKU
ORDER BY date ASC
When I run the IN version to pull the same data, it says used 51.5 MB
SELECT
date,
prod.productSKU AS SKU,
SUM(prod.productQuantity) AS qty_purchased
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST (hits) hit, UNNEST(product) prod
WHERE _TABLE_SUFFIX BETWEEN '20170101' AND '20170331'
AND
prod.productSKU IN ("GGOEGGCX056299", "GGOEGAAX0104")
GROUP BY date, SKU
ORDER BY date ASC
Upvotes: 0
Views: 502
Reputation: 172994
it uses 3X less space from my query quota [17.6 MB vs 51.5 MB]
Below is why!!!
in first query you have
WHERE _TABLE_SUFFIX BETWEEN '20170101' AND '20170131'
while in second
WHERE _TABLE_SUFFIX BETWEEN '20170101' AND '20170331'
Obviously , second query covering more tables thus difference in bytes - one month vs three months - thus ~3x difference
Upvotes: 1