Making intervals on value changes out of an ordered table by value

I have a specific task using PostgreSQL, which i need to come with a SQL query, but unfortunately, i'm not able to get a right result.

The example of the table i have:

----------------------------------------------
|    ID     |    value    |    usage-date    |
----------------------------------------------
|     1     |    value1   |    2020-09-10    |
----------------------------------------------
|     1     |    value1   |    2020-09-15    |
----------------------------------------------
|     1     |    value1   |    2020-09-20    |
----------------------------------------------
|     1     |    value1   |    2020-09-23    |
----------------------------------------------
|     1     |    value1   |    2020-09-25    |
----------------------------------------------
|     1     |    value1   |    2020-09-30    |
----------------------------------------------
|     1     |    value2   |    2020-09-15    |
----------------------------------------------
|     1     |    value2   |    2020-09-20    |
----------------------------------------------
|     1     |    value2   |    2020-09-23    |
----------------------------------------------
|     1     |    value2   |    2020-09-25    |
----------------------------------------------

So the table is ordered by ID, value and usage-date. My task is to extract intervals that should tell from which date to which date a certain value was active. So the output would look like following:

--------------------------------------------------------------
|    ID     |    value    |    start-date    |    end-date   |
--------------------------------------------------------------
|     1     |    value1   |    2020-09-10    |   2020-09-15  |
--------------------------------------------------------------
|     1     |    value2   |    2020-09-15    |   2020-09-20  |
--------------------------------------------------------------
|     1     |    value1   |    2020-09-20    |   2020-09-23  |
--------------------------------------------------------------
|     1     |    value2   |    2020-09-23    |   2020-09-25  |
--------------------------------------------------------------
|     1     |    value1   |    2020-09-25    |   2020-09-30  |
--------------------------------------------------------------

Anyone some idea how is it possible to do it?

Here is an SQL fiddle so anyone can try.

Upvotes: 2

Views: 117

Answers (2)

user3315556
user3315556

Reputation: 175

Your input does not coincide with the expected output , there is no clear distinction between value 1 and value2 interms of expected date range, it seems you just want to brute force the output.

Answer: Rather than making multiple entries for each value if you want a close range of dates for that value you can use the below query:

  select distinct one1.id,  vals.value1, one.minn as start_date, two.maxx as end_date from 
   table one1 inner join 
    (select value1 vals, min(user_date) as minn from table group by value1) one 
    on 
one1.value = one.vals
inner join (
    select value1 valm, max(user_date) as maxx from table group by value1) two 
    on one.vals = two.valm

Upvotes: 1

S-Man
S-Man

Reputation: 23666

step-by-step demo:db<>fiddle (click)

I guess, that two rows in your example are one for "start" and one for "end" of the activity. Because you didn't show any additional identifier, the query is more complex than it has to be. Because of this, we first have to remove the "end" rows manually.

SELECT
    id,
    value,
    usagedate,
    COALESCE(                                                                   -- 4
        lead(usagedate) OVER (PARTITION BY id ORDER BY usagedate),              -- 3
        first_value                                                             -- 4
    )
FROM (
    SELECT 
        *, 
        row_number() OVER (PARTITION BY id, value ORDER BY usagedate),          -- 1
        first_value(usagedate) OVER (PARTITION BY id ORDER BY usagedate DESC)   -- 2
    FROM
        t
)s
WHERE row_number % 2 = 1                                                        -- 1
  1. add a row count, an identifier to the table. The idea is to remove every second row later (the "end" records). This can be done by the row_number() window function
  2. We need to keep the last date value (we need it later). This can be done with fetching the first value in descending order (== last value in normal ascending order). Here we use an appropriate window function as well.
  3. After removing every second row (see the WHERE clause), we can use the lead() window function to get the value of every ordered next row into the current one. So, we are able to use the start date of every new activity as end date of the current one. This is, why you, actually, don't need the "end" rows and we deleted them.
  4. Only the very last records could make a problem. Because there is no next record after the last one, the lead() function returns NULL. To avoid this, we fetched the very last record (the end of the last activity) previously. With the COALESCE() function this can be achieved. It takes the first not null value in its parameter list, which is the lead value if it is not the last one, the first_value if it is

Upvotes: 1

Related Questions