Reputation: 4787
I have a dataset that looks like this:
SessionID URL created rownum
abc www.google.com/page1 2016-08-01 1
abc www.google.com/page2 2016-08-02 2
abc www.google.com/blah 2016-08-03 3
abc www.google.com/page3 2016-08-04 4
abc www.google.com/page4 2016-08-05 5
and I would like to have an output that looks like this:
SessionID URL created rownum newrownum
abc www.google.com/page1 2016-08-01 1 -2
abc www.google.com/page2 2016-08-02 2 -1
abc www.google.com/blah 2016-08-03 3 0
abc www.google.com/page3 2016-08-04 4 1
abc www.google.com/page4 2016-08-05 5 2
In other words, I would like to see where a session
is visiting a particular page blah
and I'd like to see the pages visited before and after this particular page. Notice that the pages in the session
are ordered as per the created
field.
I generated the rownum
column using the following query:
select
sessionid,
url,
created,
row_number() over(partition by sessionid order by created) as rownum
from
<tablename>
order by
sessionid,
created
I'm not sure how to get the row numbers that are negative values with respect to a certain page. This is required so I can plot the kind of pages that are visited before and after the page blah
. Of course, there can be multiple visits to the same page blah
in the same session. In such a case, the first visit should be treated as the zeroth visit (with a value 0 in newrownum
column).
Upvotes: 2
Views: 505
Reputation: 49260
You can subtract the rownum of blah from the already calculated rownum column for each session.
select t.*, rownum-max(case when url like '%blah%' then rownum end) over(partition by sessionid) newrownum
from (
select
sessionid,
url,
created,
row_number() over(partition by sessionid order by created) as rownum
from
<tablename>
) t
order by rownum
To get the exact pattern match of url containing blah
use a regex, something like url ~ '.+/blah$'
Upvotes: 2
Reputation: 21
Apply an offset based on the row number of the page you want to use as your baseline (your row 0). In this case, you could create a new field as rownum - 3 to achieve the values you are looking for.
Upvotes: 1
Reputation: 1269773
Here is one method:
select sessionid, url, created, rownum,
(rownum - min(case when url = 'blah' then rownum end) over (partition by sessionid) as newrownum
from (select sessionid, url, created,
row_number() over (partition by sessionid order by created) as rownum
from <tablename>
) t
order by sessionid, created;
In other words, this calculates the row number for 'blah' using another window function. In more recent versions of Postgres, you can use the filter
keyword rather than the case
statement for the conditional min.
Note: This does the offset from the first occurrence of 'blah'
, because of the min()
. The last occurrence would use max()
. And, your actual logic might use something like url like '%blah'
or something with the full pathname.
Upvotes: 3