Group rows with same name only under certain conditions with SQL in redshift

Question

Now I have a table with webpage data look like this:

 Visitor_ID  Visit_ID  visit_time  pagepath         page seq    timestart        timeend
   0001       0111      2018-11-19  /homepage          1         ...              ...
   0001       0111      2018-11-19  /someotherpage     2         ...              ...
   0001       0111      2018-11-19  /product           3     2018-11-19 23:05  2018-11-19 23:15
   0001       0111      2018-11-19  /product           4     2018-11-19 23:15  2018-11-19 23:16
   0001       0111      2018-11-19  /product           5     2018-11-19 23:16  2018-11-19 23:17
   0001       0111      2018-11-19  /someotherpage     6         ...              ...   
   0001       0111      2018-11-19  /someotherpage     7         ...              ...  
   0001       0111      2018-11-19  /product           8     2018-11-19 23:25  2018-11-19 23:26

There is an error in the raw data, that some of the pages are duplicated, and what I want is to group the duplication, which is in sequence like the pages sequence 3,4,5 and sum up the time browse like:

 Visitor_ID  Visit_ID  visit_time  pagepath   page seq     timestart        timeend
   0001       0111      2018-11-19  /product      3     2018-11-19 23:05  2018-11-19 23:17
   ...
   ...
   0001       0111      2018-11-19  /product      8     2018-11-19 23:25  2018-11-19 23:26

I used the group by and take the minumum. However the problem is, in this case, page 8 is a vaild visit and it will also be grouped, which is not the desired result

I also tried to use lag and partition solution, but the problem is

I can just get the row after
And the number of replication is different between record, so cannot solve it by fixed code

In overall is, I want to group the record that with the same pagepath which is on sequence and sum them up, but at the same time keep the same pagepath which is not a duplicated record. Does anyone have any idea how to do so in redshift?

Many thanks for your help in advance

Group rows with same name only under certain conditions with SQL in redshift

Answers (1)

Related Questions