bunny
bunny

Reputation: 354

Filtering out results from subquery

I have a TableA in BQ with a list of customer_ids and recommended product(ean) columns. I have a View B with all sales data including customer_ids and ean.

I want to create a table or write a query where I filter out all the recommended eans which are already being purchased by the customer.

Here is what I have so far:

SELECT
  backend_customer_id,
  ean
FROM
  TableA
WHERE
  1=1
  AND backend_customer_id IN (
  SELECT
    backend_customer_id
  FROM
    ViewB
  WHERE
    ean NOT IN (
    SELECT
      ean
    FROM
      TableA))
ORDER BY
  1,2

For some reason, not all the purchased ean's are getting filtered out. I am using legacy SQL syntax.

Upvotes: 0

Views: 1122

Answers (3)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173190

Below is for BigQuery Standard SQL

#standardSQL
SELECT t.backend_customer_id, t.ean
FROM TableA t LEFT JOIN ViewB v
ON (t.backend_customer_id, t.ean) = (v.backend_customer_id, v.ean) 
WHERE v.ean IS NULL

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1271023

I think the query you need is a bit simpler:

SELECT backend_customer_id, ean
FROM TableA a LEFT JOIN
     Viewb b
     ON a.backend_customer_id = b.backend_customer_id AND
        a.ean = b.ean
WHERE b.backend_customer_id IS NULL;

That is, do a LEFT JOIN. Then keep all the records that don't match.

Upvotes: 1

Willian Fuks
Willian Fuks

Reputation: 11797

Maybe this query in Standard SQL gives you an insight on how to adapt your query in Legacy:

WITH table_A AS(
select '1' as customer, '1' recommended_ean union all
select '1' as customer, '2' recommended_ean union all
select '1' as customer, '3' recommended_ean union all
select '2' as customer, '1' recommended_ean
),
table_b as(
select '1' as customer, '1' purchased_ean union all
select '2' as customer, '2' purchased_ean
)

SELECT
  customer,
  recommended_ean
FROM
  table_A a
WHERE 1 = 1
AND NOT EXISTS(SELECT 1 FROM table_B b WHERE a.customer = b.customer AND a.recommended_ean = b.purchased_ean)
ORDER BY 1, 2

Notice that in your query you're not actually filtering out EANs but rather customers. If the customer didn't purchase anything it then proceeds to return all recommended eans.

You should adapt your query to have a WHERE clause that filters EANs, something like:

WHERE 1 = 1
AND recommended_ean NOT IN (SELECT purchased_ean FROM table_b b WHERE a.customer = b.customer)

I highly recommend that you start using the Standard Version of BigQuery though as it offers more features and is an improved syntax overall.

Upvotes: 1

Related Questions