Rajeev kumar
Rajeev kumar

Reputation: 163

SQL query to add start and end date for a between two rows based on the input

I have a table named Acount has following structure

  acc_id      name       value   phase   date

    1         acc_type   trial   start    t1
    1         name1      value1  phase1   t2
    1         acc_type   trial   end      t3
    1         name2      value2  phase2   t4
    1         acc_type   trial   start    t5
    1         name3      value3  phase2   t6
    1         name4      value4  phase4   t7
    1         acc_type   trial   end      t8   

Table is sorted based on the date

What I want is to add two column start_date and end_date and update the column for a particular pair of phase value start and end where name is acc_type and value is trial.

So output will look like:

acc_id      name       value   phase   date    start_date    end_date

    1         acc_type   trial   start    t1      t1          t3
    1         name1      value1  phase1   t2      t1          t3
    1         acc_type   trial   end      t3      t1          t3
    1         name2      value2  phase2   t4      
    1         acc_type   trial   start    t5      t5          t8 
    1         name3      value3  phase2   t6      t5          t8
    1         name4      value4  phase4   t7      t5          t8
    1         acc_type   trial   end      t8      t5          t8

I am using PostgreSQL.

Upvotes: 0

Views: 389

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269753

This is complicated, because you need to partition by the starts and ends. If I assume that the proper ordering is the dates, then you can use cumulative max():

select t.*,
       max(case when phase = 'start' then date end) over
           (partition by acc_id order by date) as start_date,
       max(case when phase = 'end' then date end) over
           (partition by acc_id order by date) as end_date
from t;

EDIT:

I notice that there are values "in-between". This requires more complex logic. The idea is to count the net number of starts and ends, and to use this for filtering:

select t.*,
       (case when netstartend > 0 or phase in ('start', 'end')
             then max(case when phase = 'start' then date end) over
                      (partition by acc_id order by date)
        end) as start_date,
       (case when netstartend > 0 or phase in ('start', 'end')
             then max(case when phase = 'end' then date end) over
                      (partition by acc_id order by date)
        end) as end_date
from (select t.*,
             sum(case when phase = 'start' then 1 when phase = 'end' then -1 else 0 end) over
                 (order by date) as netstartend
      from t
     ) t;

Upvotes: 2

Related Questions