Reputation: 35
We switched to a new platform on Jan 1 and I need to union two tables to get a data source with the old data and new data combined. However, some accounts had to be switched out of the old platform before Jan 1.
The new data table has data from December for all accounts but I only want to use the new December data where there is no old December data. How can I union the new data with most accounts data starting on Jan 1 but also with the outlier few accounts starting on the proper day in December?
Ex: For Account1 I need new data starting Jan 1; for Account2 I need new data from Dec 30; for Account 3 I need new data from Dec 31
Old Table
------------------------------------
Account Date Sales
------------------------------------
Account1 12-29-18 10
Account1 12-30-18 10
Account1 12-31-18 5
Account2 12-29-18 10
Account3 12-29-18 20
Account3 12-30-18 10
New Table
------------------------------------
Account Date Sales
------------------------------------
Account1 12-29-18 10
Account1 12-30-18 10
Account1 12-31-18 5
Account1 01-01-19 20
Account2 12-30-18 15
Account2 12-31-18 20
Account2 01-01-19 10
Account3 12-30-18 10
Account3 12-31-18 20
Account3 01-01-19 5
Output
------------------------------------
Account Date Sales
------------------------------------
Account1 12-29-18 10
Account1 12-30-18 10
Account1 12-31-18 5
Account1 01-01-19 20
Account2 12-29-18 10
Account2 12-30-18 15
Account2 12-31-18 20
Account2 01-01-19 10
Account3 12-29-18 20
Account3 12-30-18 10
Account3 12-31-18 20
Account3 01-01-19 5
Upvotes: 2
Views: 72
Reputation: 173190
Below is for BigQuery Standard SQL
#standardSQL
SELECT account, date,
ARRAY_AGG(sales ORDER BY data LIMIT 1)[OFFSET(0)] sales
FROM (
SELECT 'old' data, * FROM `project.dataset.old_table` UNION ALL
SELECT 'new' data, * FROM `project.dataset.new_table`
)
GROUP BY account, date
You can test, play with above using example data from your question as
#standardSQL
WITH `project.dataset.old_table` AS (
SELECT 'Account1' account, '12-29-18' date, 10 sales UNION ALL
SELECT 'Account1', '12-30-18', 10 UNION ALL
SELECT 'Account1', '12-31-18', 5 UNION ALL
SELECT 'Account2', '12-29-18', 10 UNION ALL
SELECT 'Account3', '12-29-18', 20 UNION ALL
SELECT 'Account3', '12-30-18', 10
), `project.dataset.new_table` AS (
SELECT 'Account1' account, '12-29-18' date, 10 sales UNION ALL
SELECT 'Account1', '12-30-18', 10 UNION ALL
SELECT 'Account1', '12-31-18', 5 UNION ALL
SELECT 'Account1', '01-01-19', 20 UNION ALL
SELECT 'Account2', '12-30-18', 15 UNION ALL
SELECT 'Account2', '12-31-18', 20 UNION ALL
SELECT 'Account2', '01-01-19', 10 UNION ALL
SELECT 'Account3', '12-30-18', 10 UNION ALL
SELECT 'Account3', '12-31-18', 20 UNION ALL
SELECT 'Account3', '01-01-19', 5
)
SELECT account, date,
ARRAY_AGG(sales ORDER BY data LIMIT 1)[OFFSET(0)] sales
FROM (
SELECT 'old' data, * FROM `project.dataset.old_table` UNION ALL
SELECT 'new' data, * FROM `project.dataset.new_table`
)
GROUP BY account, date
ORDER BY account, PARSE_DATE('%m-%d-%y', date)
with result
Row account date sales
1 Account1 12-29-18 10
2 Account1 12-30-18 10
3 Account1 12-31-18 5
4 Account1 01-01-19 20
5 Account2 12-29-18 10
6 Account2 12-30-18 15
7 Account2 12-31-18 20
8 Account2 01-01-19 10
9 Account3 12-29-18 20
10 Account3 12-30-18 10
11 Account3 12-31-18 20
12 Account3 01-01-19 5
Upvotes: 1