Reputation: 2409
I have the following table log
:
event_time | name | ------------------------- 2014-07-16 11:40 Bob 2014-07-16 10:00 John 2014-07-16 09:20 Bob 2014-07-16 08:20 Bob 2014-07-15 11:20 Bob 2014-07-15 10:20 John 2014-07-15 09:00 Bob
I would like to generate a report, where I can group data by number of entries per day and by entry day. So the resulting report for the table above would be something like this:
event_date | 0-2 | 3 | 4-99 | ------------------------------- 2014-07-16 1 1 0 2014-07-15 2 0 0
I use the following approached to solve it:
If I find answer before anybody post it here, I will share it.
I would like to count a number of daily entries for each name
. Then I check to which column this value belongs to, and the I add 1 to that column.
Upvotes: 1
Views: 125
Reputation: 13056
This is a variation on a PIVOT
query (although PostgreSQL supports this via the crosstab(...)
table functions). The existing answers cover the basic technique, I just prefer to construct queries without the use of CASE
, where possible.
To get started, we need a couple of things. The first is essentially a Calendar Table, or entries from one (if you don't already have one, they're among the most useful dimension tables). If you don't have one, the entries for the specified dates can easily be generated:
WITH Calendar_Range AS (SELECT startOfDay, startOfDay + INTERVAL '1 DAY' AS nextDay
FROM GENERATE_SERIES(CAST('2014-07-01' AS DATE),
CAST('2014-08-01' AS DATE),
INTERVAL '1 DAY') AS dr(startOfDay))
This is primarily used to create the first step in the double aggregate, like so:
SELECT Calendar_Range.startOfDay, COUNT(Log.name)
FROM Calendar_Range
LEFT JOIN Log
ON Log.event_time >= Calendar_Range.startOfDay
AND Log.event_time < Calendar_Range.nextDay
GROUP BY Calendar_Range.startOfDay, Log.name
Remember that most aggregate columns with a nullable expression (here, COUNT(Log.name)
) will ignore null
values (not count them). This is also one of the few times it's acceptable to not include a grouped-by column in the SELECT
list (normally it makes the results ambiguous). For the actual queries I'll put this into a subquery, but it would also work as a CTE.
We also need a way to construct our COUNT
ranges. That's pretty easy too:
Count_Range AS (SELECT text, start, LEAD(start) OVER(ORDER BY start) as next
FROM (VALUES('0 - 2', 0),
('3', 3),
('4+', 4)) e(text, start))
We'll be querying these as "exclusive upper-bound" as well.
We now have all the pieces we need to do the query. We can actually use these virtual tables to make queries in both veins of the current answers.
First, the SUM(CASE...)
style.
For this query, we'll take advantage of the null-ignoring qualities of aggregate functions again:
WITH Calendar_Range AS (SELECT startOfDay, startOfDay + INTERVAL '1 DAY' AS nextDay
FROM GENERATE_SERIES(CAST('2014-07-14' AS DATE),
CAST('2014-07-17' AS DATE),
INTERVAL '1 DAY') AS dr(startOfDay)),
Count_Range AS (SELECT text, start, LEAD(start) OVER(ORDER BY start) as next
FROM (VALUES('0 - 2', 0),
('3', 3),
('4+', 4)) e(text, start))
SELECT startOfDay,
COUNT(Zero_To_Two.text) AS Zero_To_Two,
COUNT(Three.text) AS Three,
COUNT(Four_And_Up.text) AS Four_And_Up
FROM (SELECT Calendar_Range.startOfDay, COUNT(Log.name) AS count
FROM Calendar_Range
LEFT JOIN Log
ON Log.event_time >= Calendar_Range.startOfDay
AND Log.event_time < Calendar_Range.nextDay
GROUP BY Calendar_Range.startOfDay, Log.name) Entry_Count
LEFT JOIN Count_Range Zero_To_Two
ON Zero_To_Two.text = '0 - 2'
AND Entry_Count.count >= Zero_To_Two.start
AND Entry_Count.count < Zero_To_Two.next
LEFT JOIN Count_Range Three
ON Three.text = '3'
AND Entry_Count.count >= Three.start
AND Entry_Count.count < Three.next
LEFT JOIN Count_Range Four_And_Up
ON Four_And_Up.text = '4+'
AND Entry_Count.count >= Four_And_Up.start
GROUP BY startOfDay
ORDER BY startOfDay
The other option is of course the crosstab
query, where the CASE
was being used to segment the results. We'll use the Count_Range
table to decode the values for us:
SELECT startOfDay, "0 -2", "3", "4+"
FROM CROSSTAB($$WITH Calendar_Range AS (SELECT startOfDay, startOfDay + INTERVAL '1 DAY' AS nextDay
FROM GENERATE_SERIES(CAST('2014-07-14' AS DATE),
CAST('2014-07-17' AS DATE),
INTERVAL '1 DAY') AS dr(startOfDay)),
Count_Range AS (SELECT text, start, LEAD(start) OVER(ORDER BY start) as next
FROM (VALUES('0 - 2', 0),
('3', 3),
('4+', 4)) e(text, start))
SELECT Calendar_Range.startOfDay, Count_Range.text, COUNT(*) AS count
FROM (SELECT Calendar_Range.startOfDay, COUNT(Log.name) AS count
FROM Calendar_Range
LEFT JOIN Log
ON Log.event_time >= Calendar_Range.startOfDay
AND Log.event_time < Calendar_Range.nextDay
GROUP BY Calendar_Range.startOfDay, Log.name) Entry_Count
JOIN Count_Range
ON Entry_Count.count >= Count_Range.start
AND (Entry_Count.count < Count_Range.end OR Count_Range.end IS NULL)
GROUP BY Calendar_Range.startOfDay, Count_Range.text
ORDER BY Calendar_Range.startOfDay, Count_Range.text$$,
$$VALUES('0 - 2', '3', '4+')$$) Data(startOfDay DATE, "0 - 2" INT, "3" INT, "4+" INT)
(I believe this is correct, but don't have a way to test it - Fiddle doesn't seem to have the crosstab functionality loaded. In particular, CTEs probably must go inside the function itself, but I'm not sure....)
Upvotes: 1
Reputation: 656391
First aggregate in two steps:
SELECT day, CASE
WHEN ct < 3 THEN '0-2'
WHEN ct > 3 THEN '4_or_more'
ELSE '3'
END AS cat
,count(*)::int AS val
FROM (
SELECT event_time::date AS day, count(*) AS ct
FROM tbl
GROUP BY 1
) sub
GROUP BY 1,2
ORDER BY 1,2;
Names should be completely irrelevant according to your description.
Then take the query and run it through crosstab()
:
SELECT *
FROM crosstab(
$$SELECT day, CASE
WHEN ct < 3 THEN '0-2'
WHEN ct > 3 THEN '4_or_more'
ELSE '3'
END AS cat
,count(*)::int AS val
FROM (
SELECT event_time::date AS day, count(*) AS ct
FROM tbl
GROUP BY 1
) sub
GROUP BY 1,2
ORDER BY 1,2$$
,$$VALUES ('0-2'::text), ('3'), ('4_or_more')$$
) AS f (day date, "0-2" int, "3" int, "4_or_more" int);
crosstab()
is supplied by the additional module tablefunc
. Details and instructions in this related answer:
PostgreSQL Crosstab Query
Upvotes: 1
Reputation: 7837
I took it in two steps. Inner query gets the base counts. The outer query uses case statements to sum counts.
select event_date,
sum(case when cnt between 0 and 2 then 1 else 0 end) as "0-2",
sum(case when cnt = 3 then 1 else 0 end) as "3",
sum(case when cnt between 4 and 99 then 1 else 0 end) as "4-99"
from
(select cast(event_time as date) as event_date,
name,
count(1) as cnt
from log
group by cast(event_time as date), name) baseCnt
group by event_date
order by event_date
Upvotes: 2
Reputation: 4487
try like this
select da,sum(case when c<3 then 1 else 0 end) as "0-2",
sum(case when c=3 then 1 else 0 end) as "3",
sum(case when c>3 then 1 else 0 end) as "4-66" from (
select cast(event_time as date) as da,count(*) as c from
table1 group by cast(event_time as date),name) as aa group by da
Upvotes: 1