Reputation: 34145
Consider the following table (simplified version):
id int,
amount decimal,
transaction_no,
location_id int,
created_at datetime
The above schema is used to store POS receipts for restaurants. Now, this table sometimes contains a receipt from same date, same transaction_no at same location_id.
In which case what I want to do is get the last receipt of that location_id & transaction_no order by created_at desc.
In MySQL, I use the following query which gets me last (max(created_at)
receipt for a location_id & transaction_no:
SELECT id, amount, transaction_no, location_id, created_at
FROM receipts r JOIN
(SELECT transaction_no, max(created_at) AS maxca
FROM receipts r
GROUP BY transaction_no
) t
ON r.transaction_no = t.transaction_no AND r.created_at = t.maxca
group by location_id;
But when I run the same in BigQuery, I get the following error:
Query Failed Error: Shuffle reached broadcast limit for table __I0 (broadcasted at least 150393576 bytes). Consider using partitioned joins instead of broadcast joins . Job ID: circular-gist-812:job_A_CfsSKJICuRs07j7LHVbkqcpSg
Any idea how to make the above query work in BigQuery?
Upvotes: 2
Views: 826
Reputation: 172993
SELECT id, amount, transaction_no, location_id, created_at
FROM (
SELECT
id, amount, transaction_no, location_id, created_at,
ROW_NUMBER() OVER(PARTITION BY transaction_no, location_id
ORDER BY created_at DESC) as last
FROM your_dataset.your_table
)
WHERE last = 1
Upvotes: 4