psrpsrpsr
psrpsrpsr

Reputation: 457

How can I return rows for IDs that meet criteria for EXACTLY 3 occurrences in a row?

I have a requirement that is throwing me for a loop.I have to return location IDs that have exactly 3 consecutive rows of monthly invoice amounts >$2,000. In other words, I do not want to return IDs for mature locations (that might have hundreds of monthly invoices rows).

Rextester example data: http://rextester.com/CNJC15871

Info:

Desired Output: In the invoice table below,

I also have a loc_id dimension that includes the location open date.

| loc_id | loc_open_dt |
|--------|-------------|
| 1234   | 2002-05-01  |
| 9987   | 2006-10-22  |
| 5544   | 2015-03-04  |
| 2223   | 2017-05-05  |
| 3344   | 2017-07-05  |
| 6678   | 2017-08-01  | 

Upvotes: 0

Views: 136

Answers (2)

trincot
trincot

Reputation: 351308

Here is query that checks that:

  • the months with >=2000 amounts are consecutive, and
  • the month preceding the listing date is the last one of those, and
  • if there is an amount for the month preceding those three, it is below 2000

Query:

select distinct loc_id
from   (
        select loc_id, 
               first_value(invoice_amt) over win                            first_amt,
               floor((list_date - first_value(invoice_date) over win)/30)+1 month_count,
               list_date - last_value(invoice_date) over win < 30           has_last_month,
               count(case when invoice_amt >= 2000 then 1 end) over win     large_amt_count
        from   invoices,
               (select date '2017-10-01' /* current_date */ list_date) ref 
        where  invoice_date between (list_date - 120) and list_date
        window win as (partition by loc_id order by invoice_date)
       ) base
where  month_count = 3 + (first_amt < 2000)::int
   and large_amt_count = 3
   and has_last_month;

See it run on rextester

Change the literal date in the middle of the query to your actual reporting date (or current_date).

Upvotes: 1

fog
fog

Reputation: 3391

In PostgreSQL you can use a window function. You need to build the windows by aggregating your data by loc_id and then check that 3 conscutive rows have an invoice_amt greater than the target value. The trick is done by using the lag() function that, applied over a window, let you grab data from the previous rows. The code is much simpler than the explanation:

SELECT DISTINCT loc_id FROM (
  SELECT *, 
         invoice_amt > 10000 AS a, 
         lag(invoice_amt, 1) OVER w > 10000 AS b,
         lag(invoice_amt, 2) OVER w > 10000 AS c,
         extract('month' from invoice_date::date) AS m1, 
         extract('month' from (lag(invoice_date, 1) OVER w)::date + '1 month'::interval) AS m2, 
         extract('month' from (lag(invoice_date, 2) OVER w)::date + '2 month'::interval) AS m3
    FROM invoices 
  WINDOW w AS (PARTITION BY loc_id ORDER BY invoice_date)
) X 
 WHERE a AND b AND c AND m2 = m1 AND m3 = m1

Note also the check on consecutive months. We just add 1 or 2 months to the lag()ged date and then check that the month is same for the three consecutive rows (as asked in the comments).

If you want to better understand how that works just run only the inner SELECT and look at the results.

Upvotes: 1

Related Questions