Reputation: 65530
I am using BigQuery SQL. I have the following tables:
Table "public.org" (records all organisations)
Column │ Type │ Modifiers
──────────────┼────────────────────────┼───────────
code │ character varying(6) │ not null
name │ character varying(200) │ not null
setting │ integer │ not null
Table "public.spending" (records spending on chemical by org by month)
Column │ Type │ Modifiers
───────────────────┼─────────────────────────┼───────────
org_id │ character varying(6) │ not null
month │ date │ not null
chemical_id │ character varying(9) │ not null
actual_cost │ double precision │ not null
And I want to calculate the spending on a particular chemical by month, by organisation. The complication is if there was no spending by an organisation on that chemical in a month, there is simply no entry in the spending
table, rather than a zero entry. However, I would like output (a null or zero result, I don't mind which).
Right now I have this, which gives me total spending for all organisations including those that had no entries, but does not separate spending out by month:
SELECT
org.code AS code,
org.name AS name,
num.actual_cost as actual_cost
FROM (
SELECT
code,
name
FROM
org
WHERE
setting=4) AS orgs
LEFT OUTER JOIN EACH (
SELECT
org_id,
SUM(actual_cost) AS actual_cost
FROM
spending
WHERE
chemical_id='1202010U0AAAAAA'
GROUP BY
org_id) AS num
ON
num.org_id = orgs.code
So now I need to extend it to do a LEFT JOIN by month and organisation. I know that I can get the unique months in the spending
table by doing this:
SELECT month FROM spending GROUP BY month
(NB BigQuery doesn't support UNIQUE
.)
But how do I get all the unique rows for month and organisation, and only then do a LEFT JOIN onto the spending?
Upvotes: 0
Views: 1273
Reputation: 172993
I would suggested following steps for you to get through:
STEP 1 - identify months range (start and finish)
month is assumed to be presented in format YYYY-MM-01
if it is in different format - code should be slightly adjusted
SELECT
MIN(month) as start,
MAX(month) as finish
FROM public.spending
Assume Result of Step 1 is '2014-10-01' as start, '2015-05-01' as finish
Step 2 - produce all months in between Start and Finish
SELECT DATE(DATE_ADD(TIMESTAMP('2000-01-01'), pos - 1, "MONTH")) AS month
FROM (
SELECT ROW_NUMBER() OVER() AS pos, * FROM (FLATTEN((
SELECT SPLIT(RPAD('', 1000, '.'),'') AS h FROM (SELECT NULL)),h
))) nums
CROSS JOIN (
SELECT '2014-10-01' AS start, '2015-05-01' AS finish // <<-- Replace with SELECT from Step 1
) range
WHERE pos BETWEEN 1 AND 1000
AND DATE(DATE_ADD(TIMESTAMP('2000-01-01'), pos - 1, "MONTH"))
BETWEEN start AND finish
So, now - Result of Step 2 is
month
2014-10-01
2014-11-01
2014-12-01
2015-01-01
2015-02-01
2015-03-01
2015-04-01
2015-05-01
It has all months, even if some are missed in public.spending table in between start and finish
I think the rest is trivial and you have already main code for it. Let me know if this is not accurate and you need help in completing above steps
Upvotes: 0
Reputation: 207912
If we are talking about calendar months there we have only 12 options (Jan => Dec).
Just compile a static table or in the query itself as 12 selects that form a table, and use that to join.
select * from
(select 1 as m),
(select 2 as m),
....
(select 12 as m)
you might also be interested in the Technics mentioned in other posts :
Upvotes: 1
Reputation: 1269873
I'm not sure if this works in bigquery, but this is the structure of a query that does what you want:
select org.name, org.code, m.month, sum(s.actual_cost)
from org cross join
(select month from public.spending group by month) m left join
pubic.spending s
on s.ord_ig = org.code and s.month = m.month
where prescribing_setting = 4
group by org.name, org.code, m.month;
Upvotes: 0