Patthebug
Patthebug

Reputation: 4787

Relative row_number() in PostgreSQL

I have a dataset that looks like this:

SessionID    URL              created    rownum
abc   www.google.com/page1    2016-08-01   1
abc   www.google.com/page2    2016-08-02   2
abc   www.google.com/blah     2016-08-03   3
abc   www.google.com/page3    2016-08-04   4
abc   www.google.com/page4    2016-08-05   5

and I would like to have an output that looks like this:

SessionID    URL               created   rownum  newrownum
abc   www.google.com/page1    2016-08-01   1    -2
abc   www.google.com/page2    2016-08-02   2    -1
abc   www.google.com/blah     2016-08-03   3    0
abc   www.google.com/page3    2016-08-04   4    1
abc   www.google.com/page4    2016-08-05   5    2

In other words, I would like to see where a session is visiting a particular page blah and I'd like to see the pages visited before and after this particular page. Notice that the pages in the session are ordered as per the created field.

I generated the rownum column using the following query:

select 
    sessionid, 
    url, 
    created,
    row_number() over(partition by sessionid order by created) as rownum
from 
    <tablename>
order by 
    sessionid, 
    created

I'm not sure how to get the row numbers that are negative values with respect to a certain page. This is required so I can plot the kind of pages that are visited before and after the page blah. Of course, there can be multiple visits to the same page blah in the same session. In such a case, the first visit should be treated as the zeroth visit (with a value 0 in newrownum column).

Upvotes: 2

Views: 505

Answers (3)

Vamsi Prabhala
Vamsi Prabhala

Reputation: 49260

You can subtract the rownum of blah from the already calculated rownum column for each session.

select t.*, rownum-max(case when url like '%blah%' then rownum end) over(partition by sessionid) newrownum
from (
select 
    sessionid, 
    url, 
    created,
    row_number() over(partition by sessionid order by created) as rownum
from 
    <tablename>
) t
order by rownum

To get the exact pattern match of url containing blah use a regex, something like url ~ '.+/blah$'

Upvotes: 2

Lexy Kassan
Lexy Kassan

Reputation: 21

Apply an offset based on the row number of the page you want to use as your baseline (your row 0). In this case, you could create a new field as rownum - 3 to achieve the values you are looking for.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269773

Here is one method:

select sessionid, url, created, rownum,
       (rownum - min(case when url = 'blah' then rownum end) over (partition by sessionid) as newrownum
from (select sessionid, url, created,
             row_number() over (partition by sessionid order by created) as rownum
      from <tablename>
     ) t
order by sessionid, created;

In other words, this calculates the row number for 'blah' using another window function. In more recent versions of Postgres, you can use the filter keyword rather than the case statement for the conditional min.

Note: This does the offset from the first occurrence of 'blah', because of the min(). The last occurrence would use max(). And, your actual logic might use something like url like '%blah' or something with the full pathname.

Upvotes: 3

Related Questions