Worker
Worker

Reputation: 2409

SQL: grouping by number of entries and entry date

I have the following table log:

event_time       | name |
-------------------------
2014-07-16 11:40    Bob
2014-07-16 10:00   John
2014-07-16 09:20    Bob
2014-07-16 08:20    Bob

2014-07-15 11:20    Bob
2014-07-15 10:20   John
2014-07-15 09:00    Bob

I would like to generate a report, where I can group data by number of entries per day and by entry day. So the resulting report for the table above would be something like this:

event_date   | 0-2 | 3 | 4-99 |
-------------------------------
2014-07-16      1    1      0
2014-07-15      2    0      0

I use the following approached to solve it:

If I find answer before anybody post it here, I will share it.

Added

I would like to count a number of daily entries for each name. Then I check to which column this value belongs to, and the I add 1 to that column.

Upvotes: 1

Views: 125

Answers (4)

Clockwork-Muse
Clockwork-Muse

Reputation: 13056

This is a variation on a PIVOT query (although PostgreSQL supports this via the crosstab(...) table functions). The existing answers cover the basic technique, I just prefer to construct queries without the use of CASE, where possible.

To get started, we need a couple of things. The first is essentially a Calendar Table, or entries from one (if you don't already have one, they're among the most useful dimension tables). If you don't have one, the entries for the specified dates can easily be generated:

WITH Calendar_Range AS (SELECT startOfDay, startOfDay + INTERVAL '1 DAY' AS nextDay
                        FROM GENERATE_SERIES(CAST('2014-07-01' AS DATE),
                                             CAST('2014-08-01' AS DATE),
                                             INTERVAL '1 DAY') AS dr(startOfDay))

SQL Fiddle Demo

This is primarily used to create the first step in the double aggregate, like so:

SELECT Calendar_Range.startOfDay, COUNT(Log.name)
FROM Calendar_Range
LEFT JOIN Log
       ON Log.event_time >= Calendar_Range.startOfDay
          AND Log.event_time < Calendar_Range.nextDay
GROUP BY Calendar_Range.startOfDay, Log.name

SQL Fiddle Demo

Remember that most aggregate columns with a nullable expression (here, COUNT(Log.name)) will ignore null values (not count them). This is also one of the few times it's acceptable to not include a grouped-by column in the SELECT list (normally it makes the results ambiguous). For the actual queries I'll put this into a subquery, but it would also work as a CTE.

We also need a way to construct our COUNT ranges. That's pretty easy too:

     Count_Range AS (SELECT text, start, LEAD(start) OVER(ORDER BY start) as next
                     FROM (VALUES('0 - 2', 0),
                                 ('3', 3),
                                 ('4+', 4)) e(text, start))

SQL Fiddle Demo

We'll be querying these as "exclusive upper-bound" as well.

We now have all the pieces we need to do the query. We can actually use these virtual tables to make queries in both veins of the current answers.


First, the SUM(CASE...) style.
For this query, we'll take advantage of the null-ignoring qualities of aggregate functions again:

WITH Calendar_Range AS (SELECT startOfDay, startOfDay + INTERVAL '1 DAY' AS nextDay
                        FROM GENERATE_SERIES(CAST('2014-07-14' AS DATE),
                                             CAST('2014-07-17' AS DATE),
                                             INTERVAL '1 DAY') AS dr(startOfDay)),
     Count_Range AS (SELECT text, start, LEAD(start) OVER(ORDER BY start) as next
                     FROM (VALUES('0 - 2', 0),
                                 ('3', 3),
                                 ('4+', 4)) e(text, start))
SELECT startOfDay, 
       COUNT(Zero_To_Two.text) AS Zero_To_Two, 
       COUNT(Three.text) AS Three, 
       COUNT(Four_And_Up.text) AS Four_And_Up
FROM (SELECT Calendar_Range.startOfDay, COUNT(Log.name) AS count
      FROM Calendar_Range
      LEFT JOIN Log
             ON Log.event_time >= Calendar_Range.startOfDay
                AND Log.event_time < Calendar_Range.nextDay
      GROUP BY Calendar_Range.startOfDay, Log.name) Entry_Count
LEFT JOIN Count_Range Zero_To_Two
       ON Zero_To_Two.text = '0 - 2'
          AND Entry_Count.count >= Zero_To_Two.start 
          AND Entry_Count.count < Zero_To_Two.next 
LEFT JOIN Count_Range Three
       ON Three.text = '3'
          AND Entry_Count.count >= Three.start 
          AND Entry_Count.count < Three.next 
LEFT JOIN Count_Range Four_And_Up
       ON Four_And_Up.text = '4+'
          AND Entry_Count.count >= Four_And_Up.start
GROUP BY startOfDay
ORDER BY startOfDay

SQL Fiddle Example


The other option is of course the crosstab query, where the CASE was being used to segment the results. We'll use the Count_Range table to decode the values for us:

SELECT startOfDay, "0 -2", "3", "4+"
FROM CROSSTAB($$WITH Calendar_Range AS (SELECT startOfDay, startOfDay + INTERVAL '1 DAY' AS nextDay
                                        FROM GENERATE_SERIES(CAST('2014-07-14' AS DATE),
                                                             CAST('2014-07-17' AS DATE),
                                                             INTERVAL '1 DAY') AS dr(startOfDay)),
                     Count_Range AS (SELECT text, start, LEAD(start) OVER(ORDER BY start) as next
                                     FROM (VALUES('0 - 2', 0),
                                                 ('3', 3),
                                                 ('4+', 4)) e(text, start))
                SELECT Calendar_Range.startOfDay, Count_Range.text, COUNT(*) AS count
                FROM (SELECT Calendar_Range.startOfDay, COUNT(Log.name) AS count
                      FROM Calendar_Range
                      LEFT JOIN Log
                             ON Log.event_time >= Calendar_Range.startOfDay
                                AND Log.event_time < Calendar_Range.nextDay
                      GROUP BY Calendar_Range.startOfDay, Log.name) Entry_Count
                JOIN Count_Range
                  ON Entry_Count.count >= Count_Range.start
                     AND (Entry_Count.count < Count_Range.end OR Count_Range.end IS NULL)
                GROUP BY Calendar_Range.startOfDay, Count_Range.text
                ORDER BY Calendar_Range.startOfDay, Count_Range.text$$,
              $$VALUES('0 - 2', '3', '4+')$$) Data(startOfDay DATE, "0 - 2" INT, "3" INT, "4+" INT)

(I believe this is correct, but don't have a way to test it - Fiddle doesn't seem to have the crosstab functionality loaded. In particular, CTEs probably must go inside the function itself, but I'm not sure....)

Upvotes: 1

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656391

First aggregate in two steps:

SELECT day, CASE
               WHEN ct < 3 THEN '0-2'
               WHEN ct > 3 THEN '4_or_more'
               ELSE '3'
            END AS cat
      ,count(*)::int AS val
FROM  (
   SELECT event_time::date AS day, count(*) AS ct
   FROM   tbl
   GROUP  BY 1
   ) sub
GROUP  BY 1,2
ORDER  BY 1,2;

Names should be completely irrelevant according to your description.
Then take the query and run it through crosstab():

SELECT *
FROM   crosstab(
   $$SELECT day, CASE
                   WHEN ct < 3 THEN '0-2'
                   WHEN ct > 3 THEN '4_or_more'
                   ELSE '3'
                 END AS cat
           ,count(*)::int AS val
   FROM  (
      SELECT event_time::date AS day, count(*) AS ct
      FROM   tbl
      GROUP  BY 1
      ) sub
   GROUP BY 1,2
   ORDER BY 1,2$$

   ,$$VALUES ('0-2'::text), ('3'), ('4_or_more')$$
   ) AS f (day date, "0-2" int, "3" int, "4_or_more" int);

crosstab() is supplied by the additional module tablefunc. Details and instructions in this related answer:
PostgreSQL Crosstab Query

Upvotes: 1

SQLChao
SQLChao

Reputation: 7837

I took it in two steps. Inner query gets the base counts. The outer query uses case statements to sum counts.

SQL Fiddle Example

select event_date,
  sum(case when cnt between 0 and 2 then 1 else 0 end) as "0-2",
  sum(case when cnt = 3 then 1 else 0 end) as "3",
  sum(case when cnt between 4 and 99 then 1 else 0 end) as "4-99"
from 
    (select cast(event_time as date) as event_date, 
      name,
      count(1) as cnt
    from log
    group by cast(event_time as date), name) baseCnt
group by event_date
order by event_date

Upvotes: 2

Sathish
Sathish

Reputation: 4487

try like this

 select da,sum(case when c<3 then 1 else 0 end) as "0-2",
sum(case when c=3 then 1 else 0 end) as "3",
sum(case when c>3 then 1 else 0 end) as "4-66" from (
select cast(event_time as date) as da,count(*) as c from 
table1 group by cast(event_time  as date),name) as aa group by da 

Upvotes: 1

Related Questions