Reputation: 4787
I have a dataset that looks like the following:
For each OwnerID
, I'd like to calculate the difference in column creationtime
for the current record and the next record (for the same ownerID
), in the form of a new column TimeDiff
. I believe a self join would be required here, but I'm not sure how to use the self join to calculate the difference between the current record and the next record.
While doing this, the very last record for any ownerID
can have a default value of 'NA' as there won't be a next record (for the same ownerID
) to calculate the difference from.
Here's the query that I used to get this dataset:
SELECT DISTINCT ga.ownerid,
mr.name,
SPLIT_PART(SPLIT_PART(ga.activitydata,' ',2),',',1) AS Assignmentid,
EXTRACT(YEAR FROM ga.creationtime) AS YEAR,
EXTRACT(MONTH FROM ga.creationtime) AS MONTH,
EXTRACT(DAY FROM ga.creationtime) AS DAY,
EXTRACT(DOW FROM ga.creationtime) AS DOW,
ga.creationtime,
a.encodedid,
a.name
FROM flx2.groupactivities ga
JOIN flx2.memberstudytrackitemstatus mstis ON SPLIT_PART (SPLIT_PART (ga.activitydata,' ',2),',',1) = mstis.assignmentid
JOIN flx2.artifacts a ON mstis.studytrackitemid = a.id
JOIN auth.memberhasroles mhr ON mhr.memberid = ga.ownerid
JOIN flx2.memberroles mr ON mr.id = mhr.roleid
WHERE ga.activitytype = 'assign'
AND ga.ownerid NOT IN (SELECT memberid FROM auth.memberhasroles WHERE roleid = 25)
AND a.artifacttypeid = 54
AND a.encodedid IS NOT NULL
ORDER BY ga.ownerid,
ga.creationtime,
a.encodedid
I'm using Amazon Redshift
to get this data.
Any help would be appreciated.
TIA!
UPDATE:
I used the method suggested by @systemjack . Here are the results that I get:
We can clearly notice here that the column encodedid
is getting repeated for the same assignmentID
(MAT.PRB.410
, as highlighted in the image above), which shouldn't be the case. This wasn't happening without the LEAD
function, in the query mentioned above. Here's the updated query that I am using (only has an extra LEAD
function):
SELECT DISTINCT ga.ownerid,
mr.name,
SPLIT_PART(SPLIT_PART(ga.activitydata,' ',2),',',1) AS Assignmentid,
EXTRACT(YEAR FROM ga.creationtime) AS YEAR,
EXTRACT(MONTH FROM ga.creationtime) AS MONTH,
EXTRACT(DAY FROM ga.creationtime) AS DAY,
EXTRACT(DOW FROM ga.creationtime) AS DOW,
ga.creationtime,
LEAD(ga.creationtime,1) OVER (PARTITION BY ga.ownerid ORDER BY ga.creationtime) AS nexttime,
a.encodedid,
a.name
FROM flx2.groupactivities ga
JOIN flx2.memberstudytrackitemstatus mstis ON SPLIT_PART (SPLIT_PART (ga.activitydata,' ',2),',',1) = mstis.assignmentid
JOIN flx2.artifacts a ON mstis.studytrackitemid = a.id
JOIN auth.memberhasroles mhr ON mhr.memberid = ga.ownerid
JOIN flx2.memberroles mr ON mr.id = mhr.roleid
WHERE ga.activitytype = 'assign'
AND ga.ownerid NOT IN (SELECT memberid FROM auth.memberhasroles WHERE roleid = 25)
AND a.artifacttypeid = 54
AND a.encodedid IS NOT NULL
ORDER BY ga.ownerid,
ga.creationtime,
a.encodedid LIMIT 1000
The values in the nexttime
column also seem to be jacked up. It seems to be taking the next value in the creationtime
column on ocassion. For example: In the 2nd record, the value of nexttime
column should've been 2013-09-18 06:14:59
instead of 2014-01-18 12:16:49
Why are we getting more records than expected? How do I fix these problems?
Upvotes: 3
Views: 155
Reputation: 4787
So I finally figured out a way to achieve this. I made use of Dense_Rank()
function and used the following query to get the result:
WITH t AS
(
SELECT DISTINCT ga.ownerid,
SPLIT_PART(SPLIT_PART(ga.activitydata,' ',2),',',1) AS Assignmentid,
EXTRACT(YEAR FROM ga.creationtime) AS YEAR,
EXTRACT(MONTH FROM ga.creationtime) AS MONTH,
EXTRACT(DAY FROM ga.creationtime) AS DAY,
EXTRACT(DOW FROM ga.creationtime) AS DOW,
ga.creationtime,
DENSE_RANK() OVER (PARTITION BY ga.ownerid ORDER BY ga.ownerid,ga.creationtime,a.encodedid) AS RowNum,
a.encodedid,
a.name
FROM flx2.groupactivities ga
JOIN flx2.memberstudytrackitemstatus mstis ON SPLIT_PART (SPLIT_PART (ga.activitydata,' ',2),',',1) = mstis.assignmentid
JOIN flx2.artifacts a ON mstis.studytrackitemid = a.id
JOIN auth.memberhasroles mhr ON mhr.memberid = ga.ownerid
JOIN flx2.memberroles mr ON mr.id = mhr.roleid
WHERE ga.activitytype = 'assign'
AND ga.ownerid NOT IN (SELECT memberid FROM auth.memberhasroles WHERE roleid = 25)
AND a.artifacttypeid = 54
AND a.encodedid IS NOT NULL
ORDER BY ga.ownerid,
ga.creationtime,
RowNum,
a.encodedid
)
SELECT top 100 t1.ownerid,
t1.assignmentid,
t1.year,
t1.month,
t1.day,
t1.dow,
t1.creationtime,
t2.creationtime,
datediff(day,t1.creationtime,t2.creationtime),
t1.encodedid,
t1.name
FROM t AS t1
LEFT JOIN t AS t2
ON t1.ownerid = t2.ownerid
AND t1.rownum + 1 = t2.rownum
ORDER BY t1.ownerid,
t1.creationtime,
t1.rownum,
t1.encodedid
This gave me the following:
Upvotes: 0
Reputation: 2985
Update: Does this look better?
with dataset as (
SELECT DISTINCT ga.ownerid,
mr.name,
SPLIT_PART(SPLIT_PART(ga.activitydata,' ',2),',',1) AS Assignmentid,
EXTRACT(YEAR FROM ga.creationtime) AS YEAR,
EXTRACT(MONTH FROM ga.creationtime) AS MONTH,
EXTRACT(DAY FROM ga.creationtime) AS DAY,
EXTRACT(DOW FROM ga.creationtime) AS DOW,
ga.creationtime,
a.encodedid,
a.name
FROM flx2.groupactivities ga
JOIN flx2.memberstudytrackitemstatus mstis ON SPLIT_PART (SPLIT_PART (ga.activitydata,' ',2),',',1) = mstis.assignmentid
JOIN flx2.artifacts a ON mstis.studytrackitemid = a.id
JOIN auth.memberhasroles mhr ON mhr.memberid = ga.ownerid
JOIN flx2.memberroles mr ON mr.id = mhr.roleid
WHERE ga.activitytype = 'assign'
AND ga.ownerid NOT IN (SELECT memberid FROM auth.memberhasroles WHERE roleid = 25)
AND a.artifacttypeid = 54
AND a.encodedid IS NOT NULL
)
select d.*,
LEAD(creationtime,1) OVER (PARTITION BY ownerid ORDER BY creationtime) AS nexttime
from dataset d
ORDER BY ownerid, creationtime, encodedid, nextime
LIMIT 1000
Something like this (untested code) might work. Idea is to use the LEAD window function to get the creationtime
of the following record for each owner, which will be null if it's the last record, and then use regular DATEDIFF to get the units you want. The CASE statement in the outer query handles the last record edge case and you can tweak that to get the result you want there.
select ownerid, creationtime,
case when nextime is not null
then datediff('second', creationtime, nextime)
else datediff('second', creationtime, sysdate)
end as timediff
from (
select distinct ownerid, creationtime,
lead(creationtime,1) over (partition by ownerid order by creationtime) as nexttime
from yourdata
)
Upvotes: 2
Reputation: 1213
I personally see no declarative (pure SQL) way to achieve that. Sorry. You can't refer value in particular records (even whether it next or prev) in the set, and that is by nature.
So there's three ways I can see here:
1) Use Procedural Extension to SQL (MySQL has one too).
2) Get the whole set and process it externally, at the "client" (to RDBMS) side.
3) Add a timediff column to the table + AFTER INSERT/UPDATE trigger where you'd be calculating that difference and append the record with it.
Upvotes: 1