Reputation: 447
So I am trying to calculate overview statistics into JSON, but am having trouble wrangling them into a query.
There are 2 tables:
appointments
- time timestamp
- patients int
assignments
- user_id int
- appointment_id int
I want to calculate the number of patients by user, by hour for the day. Ideally, it would look like this:
[
{hour: "2015-07-01T08:00:00.000Z", assignments: [
{user_id: 123, patients: 3},
{user_id: 456, patients: 10},
{user_id: 789, patients: 4},
]},
{hour: "2015-07-01T09:00:00.000Z", assignments: [
{user_id: 456, patients: 1},
{user_id: 789, patients: 6}
]},
{hour: "2015-07-01T10:00:00.000Z", assignments: []}
...
]
I got kind of close:
with assignments_totals as (
select user_id,sum(patients),date_trunc('hour',appointments.time) as hour
from assignments
inner join appointments on appointments.id = assignments.appointment_id
group by date_trunc('hour',sales.time),user_id
), hours as (
select to_char(date_trunc('hour',time),'YYYY-MM-DD"T"HH24:00:00.000Z') as hour, array_to_json(array_agg(DISTINCT assignment_totals)) as patients
from appointments
left join assignment_totals on date_trunc('hour',sales.time) = assignment_totals.hour
where time >= '2015-07-01T07:00:00.000Z' and time < '2015-07-02T07:00:00.000Z'
group by date_trunc('hour',time)
order by date_trunc('hour',time)
)
select array_to_json(array_agg(hours)) as hours from hours;
Which outputs:
[
{hour: "2015-07-01T08:00:00.000Z", assignments: [
{user_id: 123, patients: 3, hour: "2015-07-01T08:00:00.000Z" },
{user_id: 456, patients: 10, hour: "2015-07-01T08:00:00.000Z"},
{user_id: 789, patients: 4, hour: "2015-07-01T08:00:00.000Z"},
]},
{hour: "2015-07-01T09:00:00.000Z", assignments: [
{user_id: 456, patients: 1, hour: "2015-07-01T09:00:00.000Z"},
{user_id: 789, patients: 6, hour: "2015-07-01T09:00:00.000Z"}
]},
{hour: "2015-07-01T10:00:00.000Z", assignments: [null]}
...
]
While this works, there are 2 issues, which may or may not be independent of each other:
I wanted to do something like
hours as (
select to_char(date_trunc('hour',time),'YYYY-MM-DD"T"HH24:00:00.000Z') as hour, sum(appointments.patients) OVER(partition by assignments.user_id) as appointments
from appointments
left join assignments on appointments.id = assignments.appointment_id
where time >= '2015-07-01T07:00:00.000Z' and time < '2015-07-02T07:00:00.000Z'
group by date_trunc('hour',time)
order by date_trunc('hour',time)
)
select array_to_json(array_agg(hours)) as hours from hours
but i can't get it to work without giving me a "attribute must be in the group by or aggregate function error.
Anyone know how to fix any of these issues? Thanks in advance!
Upvotes: 1
Views: 477
Reputation: 447
Most of my frustration with this came because I was not looking at the Postgres 9.4 documentation, which has new functions for dealing with json.
The solution I found builds upon the original query, but then breaks the assignments array down using json_array_elements, filters using where, then builds it back up again. It seems pointless to have essentially:
json_agg(json_array_elements(json_agg(*)))
But it makes very little performance difference and gets me where I need to go. Feel free to comment if you find a better solution! It should also be possible in <9.4 using array_agg and unnest but I was having trouble because I was trying to unnest a record type returned from my CTE, instead of an actual row type with column definitions.
with assignment_totals as (
select
date_trunc('hour',appointments.time) as hour,
user_id,
coalesce(sum(patients),0) as patients
from appointments
left outer join assignments on appointment.id = assignments.appointment_id
where time >= '2015-07-01T07:00:00.000Z' and time < '2015-07-02T07:00:00.000Z'
group by date_trunc('hour',appointments.time),user_id
), hours as (
select
to_char(assignment_totals.hour,'YYYY-MM-DD"T"HH24:00:00.000Z') as hour,
(
select coalesce(json_agg(json_build_object('user_id',(t->'user_id'),'patients',(t->'patients')) order by (t->>'user_id')),'[]'::json)
from json_array_elements(json_agg(assignment_totals)) t
where (t->>'patients') != '0'
) as patients
from assignment_totals
group by assignment_totals.hour
order by assignment_totals.hour
)
select array_to_json(array_agg(hours)) as hours from hours
Thanks to Andrew for pointing out that I can coalesce nulls to 0. But I still want to filter out entries where patients = 0. This solves all my problems by giving me the ability to filter them out with a where, and then gives me the ability to take out the time by building a new json object with json_build_object.
Upvotes: 1
Reputation: 14447
The main issue with your last query seems to be in conflating window functions with aggregate functions. Window functions use the OVER
syntax, and they do not in themselves require GROUP BY
when there are other fields in the SELECT
clause. Aggregate functions, on the other hand, use GROUP BY
when there are other (non-aggregate-function) fields in the SELECT
clause. One practical consequence of this difference is that window functions are not automatically DISTINCT
.
The issue with NULL
values resulting from the window function can be resolved with a simple COALESCE
such that zero is used instead of null.
So, to write your query using a window function, use something like:
WITH hours AS
(
SELECT DISTINCT to_char(date_trunc('hour', ap.time), 'YYYY-MM-DD"T"HH:00:00.000Z') AS hour,
COALESCE(SUM(ap.patients) OVER (PARTITION BY asgn.user_id), 0) AS appointment_count
FROM appointments ap
LEFT JOIN assignments asgn ON ap.id = asgn.appointment_id
WHERE ap.time >= '2015-07-01T07:00:00.000Z'
AND ap.time < '2015-07-02T07:00:00.000Z'
)
SELECT array_to_json(array_agg(hours)) AS hours
FROM hours
ORDER BY hour
With an aggregate function:
WITH hours AS
(
SELECT to_char(date_trunc('hour', ap.time), 'YYYY-MM-DD"T"HH:00:00.000Z') AS hour,
SUM(COALESCE(ap.patients, 0)) AS appointment_count,
asgn.user_id
FROM appointments ap
LEFT JOIN assignments asgn ON ap.id = asgn.appointment_id
WHERE ap.time >= '2015-07-01T07:00:00.000Z'
AND ap.time < '2015-07-02T07:00:00.000Z'
GROUP BY asgn.user_id, to_char(date_trunc('hour', ap.time), 'YYYY-MM-DD"T"HH:00:00.000Z')
)
SELECT array_to_json(array_agg(hours)) AS hours
FROM hours
ORDER BY hour
My syntax may not be quite correct, so double-check before using this solution or one like it (and feel free to edit to correct any errors).
Upvotes: 1