jmbull
jmbull

Reputation: 25

Postgresql: Create a date sequence, use it in date range query

I'm not great with SQL but I have been making good progress on a project up to this point. Now I am completely stuck.

I'm trying to get a count for the number of apartments with each status. I want this information for each day so that I can trend it over time. I have data that looks like this:

table: y_unit_status

unit | date_occurred | start_date | end_date   | status
1    | 2017-01-01    | 2017-01-01 | 2017-01-05 | Occupied No Notice
1    | 2017-01-06    | 2017-01-06 | 2017-01-31 | Occupied Notice
1    | 2017-02-01    | 2017-02-01 |            | Vacant
2    | 2017-01-01    | 2017-01-01 |            | Occupied No Notice

And I want to get output that looks like this:

date       | occupied_no_notice | occupied_notice | vacant
2017-01-01 | 2                  | 0               | 0
...
2017-01-10 | 1                  | 1               | 0
...
2017-02-01 | 1                  | 0               | 1

Or, this approach would work:

date       | status             | count
2017-01-01 | occupied no notice | 2
2017-01-01 | occupied notice    | 0

date_occurred: Date when the status of the unit changed start_date: Same as date_occurred end_date: Date when status stopped being x and changed to y.

I am pulling in the number of bedrooms and a property id so the second approach of selecting counts for one status at a time would produce a relatively large number of rows vs. option 1 (if that matters).

I've found a lot of references that have gotten me close to what I'm looking for but I always end up with a sort of rolling, cumulative count.

Here's my query, which produces a column of dates and counts, which accumulate over time rather than reflecting a snapshot of counts for a particular day. You can see my references to another table where I'm pulling in a property id. The table schema is Property -> Unit -> Unit Status.

WITH t AS(
    SELECT i::date from generate_series('2016-06-29', '2017-08-03', '1 day'::interval) i
    )

SELECT t.i as date,
u.hproperty,
count(us.hmy) as count --us.hmy is the id
FROM t
LEFT OUTER JOIN y_unit_status us ON t.i BETWEEN us.dtstart AND 
us.dtend
INNER JOIN y_unit u ON u.hmy = us.hunit -- to get property id
WHERE us.sstatus = 'Occupied No Notice'
AND t.i >= us.dtstart
AND t.i <= us.dtend
AND u.hproperty = '1'
GROUP BY t.i, u.hproperty
ORDER BY t.i
limit 1500

I also tried a FOR loop, iterating over the dates to determine cases where the date was between start and end but my logic wasn't working. Thanks for any insight!

Upvotes: 1

Views: 1650

Answers (1)

pozs
pozs

Reputation: 36244

You are on the right track, but you'll need to handle NULL values in end_date. If those means that status is assumed to be changed somewhere in the future (but not sure when it will change), the containment operators (@> and <@) for the daterange type are perfect for you (because ranges can be "unbounded"):

with params as (
  select date '2017-01-01' date_from,
         date '2017-02-02' date_to
)
select     date_from + d, status, count(unit)
from       params
cross join generate_series(0, date_to - date_from) d
left join  y_unit_status on daterange(start_date, end_date, '[]') @> date_from + d
group by   1, 2

To achieve the first variant, you can use conditional aggregation:

with params as (
  select date '2017-01-01' date_from,
         date '2017-02-02' date_to
)
select     date_from + d,
           count(unit) filter (where status = 'Occupied No Notice') occupied_no_notice,
           count(unit) filter (where status = 'Occupied Notice') occupied_notice,
           count(unit) filter (where status = 'Vacant') vacant
from       params
cross join generate_series(0, date_to - date_from) d
left join  y_unit_status on daterange(start_date, end_date, '[]') @> date_from + d
group by   1

Notes:

  • The syntax filter (where <predicate>) is new to 9.4+. Before that, you can use CASE (and the fact that most aggregate functions does not include NULL values) to emulate it.
  • You can even index the expression daterange(start_date, end_date, '[]') (using gist) for better performance.

http://rextester.com/HWKDE34743

Upvotes: 3

Related Questions