Reputation: 354
I have a TableA in BQ with a list of customer_ids and recommended product(ean) columns. I have a View B with all sales data including customer_ids and ean.
I want to create a table or write a query where I filter out all the recommended eans which are already being purchased by the customer.
Here is what I have so far:
SELECT
backend_customer_id,
ean
FROM
TableA
WHERE
1=1
AND backend_customer_id IN (
SELECT
backend_customer_id
FROM
ViewB
WHERE
ean NOT IN (
SELECT
ean
FROM
TableA))
ORDER BY
1,2
For some reason, not all the purchased ean's are getting filtered out. I am using legacy SQL syntax.
Upvotes: 0
Views: 1122
Reputation: 173190
Below is for BigQuery Standard SQL
#standardSQL
SELECT t.backend_customer_id, t.ean
FROM TableA t LEFT JOIN ViewB v
ON (t.backend_customer_id, t.ean) = (v.backend_customer_id, v.ean)
WHERE v.ean IS NULL
Upvotes: 1
Reputation: 1271023
I think the query you need is a bit simpler:
SELECT backend_customer_id, ean
FROM TableA a LEFT JOIN
Viewb b
ON a.backend_customer_id = b.backend_customer_id AND
a.ean = b.ean
WHERE b.backend_customer_id IS NULL;
That is, do a LEFT JOIN
. Then keep all the records that don't match.
Upvotes: 1
Reputation: 11797
Maybe this query in Standard SQL gives you an insight on how to adapt your query in Legacy:
WITH table_A AS(
select '1' as customer, '1' recommended_ean union all
select '1' as customer, '2' recommended_ean union all
select '1' as customer, '3' recommended_ean union all
select '2' as customer, '1' recommended_ean
),
table_b as(
select '1' as customer, '1' purchased_ean union all
select '2' as customer, '2' purchased_ean
)
SELECT
customer,
recommended_ean
FROM
table_A a
WHERE 1 = 1
AND NOT EXISTS(SELECT 1 FROM table_B b WHERE a.customer = b.customer AND a.recommended_ean = b.purchased_ean)
ORDER BY 1, 2
Notice that in your query you're not actually filtering out EANs but rather customers. If the customer didn't purchase anything it then proceeds to return all recommended eans.
You should adapt your query to have a WHERE
clause that filters EANs, something like:
WHERE 1 = 1
AND recommended_ean NOT IN (SELECT purchased_ean FROM table_b b WHERE a.customer = b.customer)
I highly recommend that you start using the Standard Version of BigQuery though as it offers more features and is an improved syntax overall.
Upvotes: 1