Richard
Richard

Reputation: 65530

How do I create a list of dates, before doing a LEFT JOIN?

I am using BigQuery SQL. I have the following tables:

Table "public.org" (records all organisations)
    Column    │          Type          │ Modifiers
──────────────┼────────────────────────┼───────────
 code         │ character varying(6)   │ not null
 name         │ character varying(200) │ not null   
 setting      │ integer                │ not null    

 Table "public.spending" (records spending on chemical by org by month)
    Column         │          Type           │ Modifiers
───────────────────┼─────────────────────────┼───────────
 org_id            │ character varying(6)    │ not null
 month             │ date                    │ not null
 chemical_id       │ character varying(9)    │ not null
 actual_cost       │ double precision        │ not null

And I want to calculate the spending on a particular chemical by month, by organisation. The complication is if there was no spending by an organisation on that chemical in a month, there is simply no entry in the spending table, rather than a zero entry. However, I would like output (a null or zero result, I don't mind which).

Right now I have this, which gives me total spending for all organisations including those that had no entries, but does not separate spending out by month:

SELECT
  org.code AS code,
  org.name AS name,
  num.actual_cost as actual_cost
FROM (
  SELECT
    code,
    name
  FROM
    org
  WHERE
    setting=4) AS orgs
LEFT OUTER JOIN EACH (
  SELECT
    org_id,
    SUM(actual_cost) AS actual_cost
  FROM
    spending
  WHERE
    chemical_id='1202010U0AAAAAA'
  GROUP BY
    org_id) AS num
ON
  num.org_id = orgs.code

So now I need to extend it to do a LEFT JOIN by month and organisation. I know that I can get the unique months in the spending table by doing this:

SELECT month FROM spending GROUP BY month

(NB BigQuery doesn't support UNIQUE.)

But how do I get all the unique rows for month and organisation, and only then do a LEFT JOIN onto the spending?

Upvotes: 0

Views: 1273

Answers (3)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

I would suggested following steps for you to get through:

STEP 1 - identify months range (start and finish)
month is assumed to be presented in format YYYY-MM-01
if it is in different format - code should be slightly adjusted

SELECT 
  MIN(month) as start, 
  MAX(month) as finish
FROM public.spending

Assume Result of Step 1 is '2014-10-01' as start, '2015-05-01' as finish

Step 2 - produce all months in between Start and Finish

SELECT DATE(DATE_ADD(TIMESTAMP('2000-01-01'), pos - 1, "MONTH")) AS month
FROM (
  SELECT ROW_NUMBER() OVER() AS pos, * FROM (FLATTEN((
  SELECT SPLIT(RPAD('', 1000, '.'),'') AS h FROM (SELECT NULL)),h
))) nums 
CROSS JOIN (
  SELECT '2014-10-01' AS start, '2015-05-01' AS finish // <<-- Replace with SELECT from Step 1
) range
WHERE pos BETWEEN 1 AND 1000
AND DATE(DATE_ADD(TIMESTAMP('2000-01-01'), pos - 1, "MONTH")) 
    BETWEEN start AND finish

So, now - Result of Step 2 is

month    
2014-10-01   
2014-11-01   
2014-12-01   
2015-01-01   
2015-02-01   
2015-03-01   
2015-04-01   
2015-05-01

It has all months, even if some are missed in public.spending table in between start and finish

I think the rest is trivial and you have already main code for it. Let me know if this is not accurate and you need help in completing above steps

Upvotes: 0

Pentium10
Pentium10

Reputation: 207912

If we are talking about calendar months there we have only 12 options (Jan => Dec).

Just compile a static table or in the query itself as 12 selects that form a table, and use that to join.

select * from 
(select 1 as m),
(select 2 as m),
....
(select 12 as m)

you might also be interested in the Technics mentioned in other posts :

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269873

I'm not sure if this works in bigquery, but this is the structure of a query that does what you want:

select org.name, org.code, m.month, sum(s.actual_cost)
from org cross join
     (select month from public.spending group by month) m left join
     pubic.spending s
     on s.ord_ig = org.code and s.month = m.month
where prescribing_setting = 4
group by org.name, org.code, m.month;

Upvotes: 0

Related Questions