jl6
jl6

Reputation: 6394

How do I write a query that imputes values for records that are not present in a table?

I have a table that looks like this:

MONTH | WIDGET | VALUE
------+--------+------
 Dec  |   A    |   3
 Jan  |   B    |   5
 Feb  |   B    |   6
 Mar  |   B    |   7

and I want to write a query that produces, for each MONTH and WIDGET the difference in VALUE between the current month and the previous month. So I want an output table like this:

MONTH | WIDGET | VALUE
------+--------+------
 Dec  |   A    |   3
 Jan  |   A    |  -3
 Feb  |   A    |   0
 Mar  |   A    |   0
 Dec  |   B    |   0
 Jan  |   B    |   5
 Feb  |   B    |   1
 Mar  |   B    |   1

If there is no recorded value for the previous month for a given widget, I want to assume the previous month's value is zero. Conversely, if there is no recorded value for the current month, I want to assume the current month's value is zero.

I believe a cross join over all combinations of month and widget might work, by giving me a "spine" to which I can left join my data and then use coalesce - but is there a better way?

Edit: We can assume the MONTH column actually has a numeric representation to make it easier to identify the previous.

Upvotes: 0

Views: 1178

Answers (3)

Niederee
Niederee

Reputation: 4295

I would use the lag function. IBM Reference I just defaulted to 0 for values whose prior value doesn't exist but you can handle that a number of different ways.

   create temp table test (
    mth date
    ,widget char(1)
    ,value integer
    )
    distribute on random;

    insert into test values('2013-12-01','A',3);
    insert into test values('2014-01-01','A',-3);
    insert into test values('2014-02-01','A',0);
    insert into test values('2014-03-01','A',0);
    insert into test values('2013-12-01','B',0);
    insert into test values('2014-01-01','B',5);
    insert into test values('2014-02-01','B',1);
    insert into test values('2014-03-01','B',1);


    select *
    ,lag(value,1) over(partition by widget order by mth) as prior_row
    ,value - nvl(lag(value,1) over(partition by widget order by mth),0) as diff
    from test

Upvotes: 1

Joseph B
Joseph B

Reputation: 5669

Here is another alternative to get the required data. Two CTE's are used, including one to contain the month numbers.

The SQL Fiddle can be accessed here.

WITH month_order as 
(
  SELECT 'Jan' as month, 1 as month_no, 12 as prev_month_no
  UNION ALL
  SELECT 'Feb' as month, 2 as month_no, 1 as prev_month_no
  UNION ALL
  SELECT 'Mar' as month, 3 as month_no, 2 as prev_month_no
  UNION ALL
  SELECT 'Apr' as month, 4 as month_no, 3 as prev_month_no
  UNION ALL
  SELECT 'May' as month, 5 as month_no, 4 as prev_month_no
  UNION ALL
  SELECT 'Jun' as month, 6 as month_no, 5 as prev_month_no
  UNION ALL
  SELECT 'Jul' as month, 7 as month_no, 6 as prev_month_no
  UNION ALL
  SELECT 'Aug' as month, 8 as month_no, 7 as prev_month_no
  UNION ALL
  SELECT 'Sep' as month, 9 as month_no, 8 as prev_month_no
  UNION ALL
  SELECT 'Oct' as month, 10 as month_no, 9 as prev_month_no
  UNION ALL
  SELECT 'Nov' as month, 11 as month_no, 10 as prev_month_no
  UNION ALL
  SELECT 'Dec' as month, 12 as month_no, 11 as prev_month_no
)  

, values_all_months as 
(
SELECT
    month_order.prev_month_no as prev_month_no
    , month_order.month_no as month_no
    , w4.month as month
    , w4.widget as widget
    , COALESCE(w3.value, 0) as value
FROM widgets w3
RIGHT OUTER JOIN
(
  SELECT
  w1.widget as widget
  ,w2.month as month
FROM
(SELECT
  DISTINCT
  widget
FROM widgets) w1,
(SELECT 
  DISTINCT
  month
FROM widgets) w2
) w4
ON w3.month = w4.month and w3.widget = w4.widget
INNER JOIN month_order
ON w4.month = month_order.month
)

SELECT mo.month, vam1.widget, vam1.value - COALESCE(vam2.value, 0) VALUE
FROM values_all_months vam1
LEFT OUTER JOIN values_all_months vam2
ON vam1.widget = vam2.widget AND vam1.prev_month_no = vam2.month_no
INNER JOIN month_order mo
ON vam1.month_no = mo.month_no
ORDER BY vam1.widget, (SELECT CASE vam1.month_no WHEN 12 THEN 0 ELSE vam1.month_no END);

Upvotes: 0

sarin
sarin

Reputation: 5307

OK. I have solved this in MS SQL but it should be transferable to PostgresQL. I have SQLFiddled the answer:

CREATE TABLE WidgetMonths (Month tinyint, Widget varchar(1), Value int)

CREATE TABLE Months (Month tinyint, MonthOrder tinyint)

insert into WidgetMonths Values 
 (12, 'A', 3), 
 (1,'B', 5),
 (2,'B', 6),
 (3,'B', 7);


insert into Months Values
 (12, 1), (1, 2), (2, 3), (3, 4)


Select 
AllWidgetMonths.Widget, 
AllWidgetMonths.Month,
IsNull(wm.Value,0) - IsNull(wmn.Value,0) as Value
from (
      select Distinct Widget, Months.Month, Months.MonthOrder
      from WidgetMonths
      Cross Join months 
      ) AllWidgetMonths
left join WidgetMonths wm on wm.Widget = AllWidgetMonths.Widget
                          AND wm.Month =  AllWidgetMonths.Month

left join WidgetMonths wmn on wmn.Widget = AllWidgetMonths.Widget
                          AND Case When wmn.Month = 12 Then 1 Else wmn.Month + 1 End =  AllWidgetMonths.Month
Order by AllWidgetMonths.Widget, AllWidgetMonths.MonthOrder

I have started off with a Table of WidgetMonths from your example the only difference being I have converted the months into a representative integer.

I have then Created the Months Table of All months we are interested in from your example. If you want months for the whole year you can simply add to this table or find another way of generating a 1-12 row result set. The MonthOrder is optional and just helped me achieve your answer ordering.

As you mentioned AllwidgetMonths has the Cross join which gives us all combinations of Widgets and Months. This maybe better achieved by a Cross join between 'Widgets' Table and the Months Table. But I wasn't sure if this existed so left this out.

We left join WidgetMonths onto our master table of All widget months to show us which months we have a value for.

The trick up the sleeve is then left joining the same table again but this time adding 1 to the month number inside the join. This shifts the rows down one. Notice I have a Case statement (not sure about this in PostgresSql) to deal with the roll over of Month 12 to Month 1. This effectively gives me the values for each month and its previous on each row of AllwidgetMonths.

The final bit is to take one value from the other.

Hey presto. I can try to update this to PostgresSQL but you may have more knowledge and can solve it quicker than I.

Upvotes: 0

Related Questions