Reputation: 404
I have some data in web_event table of Netezza in this below format.
vstr_id | sessn_id | sessn_ts | wbpg_nm
V1 | V1S1 | 02-02-2015 09:20:00 | /home/login
V1 | V1S1 | 02-02-2015 09:22:00 | -1
V1 | V1S1 | 02-02-2015 09:30:00 | /home/contacts
V1 | V1S1 | 02-02-2015 09:32:00 | -1
V1 | V1S1 | 02-02-2015 09:50:00 | /home/search
V1 | V1S1 | 02-02-2015 09:55:00 | -1
V2 | V2S1 | 02-02-2015 09:10:00 | /home
V2 | V2S1 | 02-02-2015 09:15:00 | /home/apps
V2 | V2S2 | 02-02-2015 09:20:00 | /home/news
V2 | V2S2 | 02-02-2015 09:23:00 | /home/news/internal
This is my source table.
I am trying to use that web_event table and create another table like below.
I want the sessn_durtn table and time_on_pg table to be loaded like below.
1) time_on_page column : It is the time difference between the current page and next page load and the last page of the session can have 0 secs if there are no other events or page loads. It can be represented in minutes or seconds.
Insert into time_on_pg (select VSTR_ID,
SESSN_ID,
sessn_ts,
WBPG_NM,
????? as time_on_page
from web_event)
vstr_id | sessn_id | sessn_ts | wbpg_nm | wanted_time_on_page | currently_known_time_on_page
V1 | V1S1 | 02-02-2015 09:20:00 | /home/login | 10mins | 2mins
V1 | V1S1 | 02-02-2015 09:22:00 | -1 | | 8mins
V1 | V1S1 | 02-02-2015 09:30:00 | /home/contacts | 20mins | 2mins
V1 | V1S1 | 02-02-2015 09:32:00 | -1 | | 18mins
V1 | V1S1 | 02-02-2015 09:50:00 | /home/search | 5mins | 5mins
V1 | V1S1 | 02-02-2015 09:55:00 | -1 | |
V2 | V2S1 | 02-02-2015 09:10:00 | /home | 5mins | 5mins
V2 | V2S1 | 02-02-2015 09:15:00 | /home/apps | |
V2 | V2S2 | 02-02-2015 09:20:00 | /home/news | 3mins | 3mins
V2 | V2S2 | 02-02-2015 09:23:00 | /home/news/internal | |
How can we do this in Netezza or any SQL query?
i have the logic to calculate the currently_known_time_on_page using
SELECT vstr_id,
sessn_id,
sessn_ts,
wbpg_nm,
???????? AS wanted_time_on_page,
extract(epoch from (lag(event_ts) over (partition by vstr_id, sessn_id order by event_ts DESC) - event_ts)) AS currently_known_time_on_page
from web_event;
The main difference between wanted_time_on_page and currently_known_time_on_page is eliminating "-1" pages while calculating time difference except for last page.
Upvotes: 0
Views: 389
Reputation: 14361
I think event_ts is the same as sessn_ts ???? Anyway here is a query that should work for you it uses the OUTER APPLY
technique to lookup results in the table that are after (> sessn_ts)
are not web page -1
, and then get's the top result ascending order.
Just change the table name to your table.
Here is a solution that primarily is using an outer apply
but also using a common table expression (cte
) to set the time for the last '-1'
needed.
;WITH cteMaxNeg1 AS (
SELECT
sessn_id
,MaxNeg1SessnTs = MAX(CASE WHEN we.webpg_nm = '-1' THEN we.sessn_ts ELSE NULL END)
,MaxPageSessnTs = MAX(CASE WHEN we.webpg_nm <> '-1' THEN we.sessn_ts ELSE NULL END)
FROM
@WebEvents we
GROUP BY
sessn_id
)
SELECT
we.*
,currently_known_time_on_page = ISNULL(LAG(we.sessn_ts) over (partition by we.vstr_id, we.sessn_id order by we.sessn_ts DESC) - we.sessn_ts,CAST(0 AS DATETIME))
,WantedTimeOnPage = CASE
WHEN we.sessn_ts = m.MaxPageSessnTs AND we.webpg_nm <> '-1' THEN DATEDIFF(MINUTE,we.sessn_ts,m.MaxNeg1SessnTs)
WHEN we.webpg_nm <> '-1' THEN DATEDIFF(MINUTE,we.sessn_ts,o.sessn_ts)
ELSE NULL
END
FROM
@WebEvents we
LEFT JOIN cteMaxNeg1 m
ON we.sessn_id = m.sessn_id
OUTER APPLY (
SELECT TOP 1sessn_ts
FROM
@WebEvents i
WHERE
i.webpg_nm <> '-1'
AND i.sessn_id = we.sessn_id
AND i.sessn_ts > we.sessn_ts
ORDER BY
i.sessn_ts ASC
) o
ORDER BY
we.sessn_id
,we.sessn_ts
Here is a solution just using CTE and window function
;WITH cte AS (
SELECT
*
,RowNum = ROW_NUMBER() OVER (PARTITION BY sessn_id, IIF(webpg_nm = '-1',0,1) ORDER BY sessn_ts)
,LastNeg1RowNum = ROW_NUMBER() OVER (PARTITION BY sessn_id, IIF(webpg_nm = '-1',0,1) ORDER BY sessn_ts DESC)
FROM
@WebEvents
)
SELECT
c1.*
,WantedTimeOnPage = CASE
WHEN c1.LastNeg1RowNum = 1 AND c1.webpg_nm <> '-1' THEN DATEDIFF(MINUTE,c1.sessn_ts,c3.sessn_ts)
WHEN c1.webpg_nm <> '-1' THEN DATEDIFF(MINUTE,c1.sessn_ts,c2.sessn_ts)
ELSE NULL
END
FROM
cte c1
LEFT JOIN cte c2
ON c1.sessn_id = c2.sessn_id
AND (c1.RowNum + 1) = c2.RowNum
AND c2.webpg_nm <> '-1'
LEFT JOIN cte c3
ON c1.sessn_id = c3.sessn_id
AND c3.LastNeg1RowNum = 1
AND c3.webpg_nm = '-1'
ORDER BY
c1.sessn_id
,c1.sessn_ts
Test Data I used from you:
DECLARE @WebEvents AS TABLE (vstr_id CHAR(2), sessn_id CHAR(5), sessn_ts DATETIME, webpg_nm VARCHAR(100))
INSERT INTO @WebEvents (vstr_id, sessn_id, sessn_ts, webpg_nm)
VALUES
('V1','V1S1','02-02-2015 09:20:00','/home/login')
,('V1','V1S1','02-02-2015 09:22:00','-1')
,('V1','V1S1','02-02-2015 09:30:00','/home/contacts')
,('V1','V1S1','02-02-2015 09:32:00','-1')
,('V1','V1S1','02-02-2015 09:50:00','/home/search')
,('V1','V1S1','02-02-2015 09:55:00','-1')
,('V2','V2S1','02-02-2015 09:10:00','/home')
,('V2','V2S1','02-02-2015 09:15:00','/home/apps')
,('V2','V2S2','02-02-2015 09:20:00','/home/news')
,('V2','V2S2','02-02-2015 09:23:00','/home/news/internal')
Upvotes: 1
Reputation: 369
I don't know how large your dataset is and how much RAM you have available. This query is doing it all in memory. You could turn each individual CTE into temp tables for speed.
WITH CTE_SessionOrder AS (
SELECT
sessn_id
,sessn_ts
,wbpg_nm
,ROW_NUMBER() OVER(PARTITION BY sessn_id ORDER BY sessn_ts DESC) AS RowNum -- This is sorted Desc to get last row
FROM
web_event
)
,CTE_KeepLastRow AS (
SELECT *
FROM
CTE_SessionOrder
WHERE
RowNum = 1
AND wbpg_nm = '-1'
)
,CTE_OtherRows AS (
SELECT *
FROM
CTE_SessionOrder
WHERE
wbpg_nm != '-1'
)
,CTE_FilteredData AS (
SELECT sessn_id,sessn_ts,wbpg_nm FROM CTE_KeepLastRow
UNION
SELECT sessn_id,sessn_ts,wbpg_nm FROM CTE_OtherRows
)
,CTE_FilterOrderedData AS (
SELECT
*
,ROW_NUMBER() OVER(PARTITION BY sessn_id ORDER BY sessn_ts) AS RowNum -- Now Ordered Asc
FROM
CTE_FilteredData
)
,CTE_FinalData AS (
SELECT
D1.sessn_id
,D1.sessn_ts
,D1.wbpg_nm
,DATEDIFF(mi,D1.sessn_ts,D2.sessn_ts) time_on_page
FROM
CTE_FilterOrderedData D1
LEFT JOIN CTE_FilterOrderedData D2
ON D1.sessn_id = D2.sessn_id
AND D1.RowNum + 1 = D2.RowNum
UNION
SELECT
sessn_id
,sessn_ts
,wbpg_nm
,CAST(NULL AS INT) time_on_page
FROM
CTE_SessionOrder
WHERE
RowNum != 1
AND wbpg_nm = '-1'
)
SELECT *
FROM
CTE_FinalData
Upvotes: 2