netllama
netllama

Reputation: 381

Querying row counts segregated by date ranges

I've got a PostgreSQL 9.2.1 database, where I'm attempting and failing to compose a SQL query which will show me the count of distinct tests (testname) which failed (current_status='FAILED' and showing 0 if there were no failures), segregated by month (last_update). Here's the table definition:

                                       Table "public.tests"
     Column     |            Type             |                          Modifiers                          
----------------+-----------------------------+-------------------------------------------------------------
 id             | bigint                      | not null default nextval('tests_id_seq'::regclass)
 testname       | text                        | not null
 last_update    | timestamp without time zone | not null default now()
 current_status | text                        | not null

What I'd like to get back from that is something like this:

 testname    | Jan2012  | Feb2012  | Mar2012  | Apr2012  | May2012   | Jun2012   | Jul2012   | Aug2012   | Sep2012   | Oct2012   | Nov2012   | Dec2012
-------------+-----------------------------------------------------------------------------------------------------------------------------------------
 abq         |   2      |   5      |   2      |   0      |   7       |  4        |   8       |   0       |     6     |   15      |  1        |  0
 bar         |   0      |   0      |   2      |   0      |   9       |  8        |   8       |   2       |     6     |   15      |  1        |  1
 cho         |   15     |   1      |   2      |   3      |   4       |  8        |   7       |   3       |     6     |   1       |  5        |  6

At this point, the best that I could come up with is the following, which is admittedly not close:

SELECT testname, count(current_status) AS failure_count
FROM tests
WHERE current_status='FAILED'
AND last_update>'2012-09-01'
AND last_update<='2012-09-30'
GROUP by testname
ORDER BY testname ;

I think I'd need to somehow use COALESCE to get 0 values to show up in the results, plus some crazy JOINs to show multiple months of results, and maybe even a window function?

Upvotes: 2

Views: 946

Answers (1)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656982

crosstab() function with two parameters.

Should work like this, to get values for 2012:

SELECT * FROM crosstab(
     $$SELECT testname, to_char(last_update, 'mon_YYYY'), count(*)::int AS ct
        FROM   tests
        WHERE  current_status = 'FAILED'
        AND    last_update >= '2012-01-01 0:0'
        AND    last_update <  '2013-01-01 0:0'  -- proper date range!
        GROUP  BY 1,2
        ORDER  BY 1,2$$

    ,$$VALUES
      ('jan_2012'::text), ('feb_2012'), ('mar_2012')
    , ('apr_2012'), ('may_2012'), ('jun_2012')
    , ('jul_2012'), ('aug_2012'), ('sep_2012')
    , ('oct_2012'), ('nov_2012'), ('dec_2012')$$)
AS ct (testname  text
   , jan_2012 int, feb_2012 int, mar_2012 int
   , apr_2012 int, may_2012 int, jun_2012 int
   , jul_2012 int, aug_2012 int, sep_2012 int
   , oct_2012 int, nov_2012 int, dec_2012 int);

Find detailed explanation under this related question.

I didn't test. As @Craig commented, sample values would have helped.
Tested now with my own test case.

Don't display NULL values

The main problem (that months without rows wouldn't show up at all) is averted by the crosstab() function with two parameters.

You cannot use COALESCE in the inner query, because the NULL values are inserted by crosstab() itself. You could ...

1. Wrap the whole thing into a subquery:

SELECT testname
      ,COALESCE(jan_2012, 0) AS jan_2012
      ,COALESCE(feb_2012, 0) AS feb_2012
      ,COALESCE(mar_2012, 0) AS mar_2012
      , ...
FROM (
    -- query from above)
    ) x;

2. LEFT JOIN the primary query to the full list of months.

In this case, you don't need the second parameter by definition.
For a bigger range you could use generate_series() to create the values.

SELECT * FROM crosstab(
     $$SELECT t.testname, m.mon, count(x.testname)::int AS ct
       FROM  (
          VALUES
           ('jan_2012'::text), ('feb_2012'), ('mar_2012')
          ,('apr_2012'), ('may_2012'), ('jun_2012')
          ,('jul_2012'), ('aug_2012'), ('sep_2012')
          ,('oct_2012'), ('nov_2012'), ('dec_2012')
       ) m(mon)
       CROSS JOIN (SELECT DISTINCT testname FROM tests) t
       LEFT JOIN (
          SELECT testname
                ,to_char(last_update, 'mon_YYYY') AS mon
          FROM   tests
          WHERE  current_status = 'FAILED'
          AND    last_update >= '2012-01-01 0:0'
          AND    last_update <  '2013-01-01 0:0'  -- proper date range!
          ) x USING (mon)
       GROUP  BY 1,2
       ORDER  BY 1,2$$
     )
AS ct (testname  text
   , jan_2012 int, feb_2012 int, mar_2012 int
   , apr_2012 int, may_2012 int, jun_2012 int
   , jul_2012 int, aug_2012 int, sep_2012 int
   , oct_2012 int, nov_2012 int, dec_2012 int);

Test case with sample data

Here is a test case with some sample data that the OP failed to provide. I used this to test it and make it work.

CREATE TEMP TABLE tests (
  id             bigserial PRIMARY KEY
 ,testname       text NOT NULL
 ,last_update    timestamp without time zone NOT NULL DEFAULT now()
 ,current_status text NOT NULL
 );

INSERT INTO tests (testname, last_update, current_status)
VALUES
  ('foo', '2012-12-05 21:01', 'FAILED')
 ,('foo', '2012-12-05 21:01', 'FAILED')
 ,('foo', '2012-11-05 21:01', 'FAILED')
 ,('bar', '2012-02-05 21:01', 'FAILED')
 ,('bar', '2012-02-05 21:01', 'FAILED')
 ,('bar', '2012-03-05 21:01', 'FAILED')
 ,('bar', '2012-04-05 21:01', 'FAILED')
 ,('bar', '2012-05-05 21:01', 'FAILED');

Upvotes: 1

Related Questions