Pentium10
Pentium10

Reputation: 208032

NOT IN usage with GROUP BY and HAVING

We are trying to SUM all order revenue, where we didn't registered payment success event. (eg: failed orders) A failed order is either without payment attempt, or a failed payment event.

SELECT
o.campaign.id AS campaign_id,
o.campaign.template_id AS campaign_template_id,
o.campaign.lang AS language,
o.campaign.split_idx AS campaign_split_id,
count(distinct o.order_id) AS orders,
SUM(o.total) AS totalrevenue,
SUM(o.shipping_value) AS shipping_value
FROM [wr_live.order] AS o
GROUP BY campaign_id, campaign_template_id, language, campaign_split_id
HAVING
o.order_id NOT IN (
SELECT order_id
FROM [wr_live.order_event] as e
WHERE e.order_id = o.order_id AND e.event = 'payment' and e.status = 1
) 

.

Query Failed
Error: Field 'o.order_id' not found in table 'e'; did you mean 'order_id'?
Job ID: aerobic-forge-504:job_h41lud83lyqiD7p6qldXMl_tx0A

As you see we are running a NOT IN clause where we check if there we don't have for that particular order any event with payment, and status =1.

We have Case 05024161 on paid Google enterprise support, but apparently they were not able to help.

Upvotes: 2

Views: 430

Answers (2)

sprocket
sprocket

Reputation: 1277

As a member of the BigQuery team, I was able to run your query and experiment with different approaches. It seems that it can run with an anti-join (NOT IN) as long as you remove the reference to o.order_id and modify some of the field names.

SELECT
  campaign.id AS campaign_id,
  campaign.template_id AS campaign_template_id,
  campaign.lang AS language,
  campaign.split_idx AS campaign_split_id,
  COUNT(DISTINCT order_id) AS orders,
  SUM(total) AS totalrevenue,
  SUM(shipping_value) AS shipping_value
FROM
  [wr_live.order]
WHERE
  order_id NOT IN (
  SELECT
    order_id
  FROM
    [wr_live.order_event]
  WHERE
    event = 'payment'
    AND status = 1
    )
GROUP BY
  campaign_id,
  campaign_template_id,
  language,
  campaign_split_id;

This query ran a few times in less than 10 seconds. I can't guarantee that it will always be that fast, but you might want to try it.

Upvotes: 1

N.N.
N.N.

Reputation: 3172

"Note that the HAVING clause can only refer to fields defined in your SELECT clause (if the field has an alias, you must use it; if it doesn't, use the aggregate field name instead)."

https://cloud.google.com/bigquery/query-reference#having

in your query there is no "Order_Id" in the select clause. try putting it in a where clause before the group by.

something like:

SELECT
o.campaign.id AS campaign_id,
o.campaign.template_id AS campaign_template_id,
o.campaign.lang AS language,
o.campaign.split_idx AS campaign_split_id,
count(distinct o.order_id) AS orders,
SUM(o.total) AS totalrevenue,
SUM(o.shipping_value) AS shipping_value
FROM(
Select * from  [wr_live.order] 
Where order_id NOT IN (
SELECT order_id
FROM [wr_live.order_event] as e
WHERE  e.event = 'payment' and e.status = 1
) 
)AS o
GROUP BY campaign_id, campaign_template_id, language, campaign_split_id

In addition BQ does not support sub-query reference to parent query. if you need it, try to replace with Join

Upvotes: 3

Related Questions