Patthebug
Patthebug

Reputation: 4787

Calculate difference in time between different records

I have a dataset that looks like the following:

enter image description here

For each OwnerID, I'd like to calculate the difference in column creationtime for the current record and the next record (for the same ownerID), in the form of a new column TimeDiff. I believe a self join would be required here, but I'm not sure how to use the self join to calculate the difference between the current record and the next record.

While doing this, the very last record for any ownerID can have a default value of 'NA' as there won't be a next record (for the same ownerID) to calculate the difference from.

Here's the query that I used to get this dataset:

    SELECT DISTINCT ga.ownerid,
         mr.name,
         SPLIT_PART(SPLIT_PART(ga.activitydata,' ',2),',',1) AS Assignmentid,
         EXTRACT(YEAR FROM ga.creationtime) AS YEAR,
         EXTRACT(MONTH FROM ga.creationtime) AS MONTH,
         EXTRACT(DAY FROM ga.creationtime) AS DAY,
         EXTRACT(DOW FROM ga.creationtime) AS DOW,
         ga.creationtime,
         a.encodedid,
         a.name
  FROM flx2.groupactivities ga
    JOIN flx2.memberstudytrackitemstatus mstis ON SPLIT_PART (SPLIT_PART (ga.activitydata,' ',2),',',1) = mstis.assignmentid
    JOIN flx2.artifacts a ON mstis.studytrackitemid = a.id
    JOIN auth.memberhasroles mhr ON mhr.memberid = ga.ownerid
    JOIN flx2.memberroles mr ON mr.id = mhr.roleid
  WHERE ga.activitytype = 'assign'
  AND   ga.ownerid NOT IN (SELECT memberid FROM auth.memberhasroles WHERE roleid = 25)
  AND   a.artifacttypeid = 54
  AND   a.encodedid IS NOT NULL
  ORDER BY ga.ownerid,
           ga.creationtime,
           a.encodedid

I'm using Amazon Redshift to get this data.

Any help would be appreciated.

TIA!

UPDATE:

I used the method suggested by @systemjack . Here are the results that I get:

enter image description here

We can clearly notice here that the column encodedid is getting repeated for the same assignmentID (MAT.PRB.410, as highlighted in the image above), which shouldn't be the case. This wasn't happening without the LEAD function, in the query mentioned above. Here's the updated query that I am using (only has an extra LEAD function):

SELECT DISTINCT ga.ownerid,
       mr.name,
       SPLIT_PART(SPLIT_PART(ga.activitydata,' ',2),',',1) AS Assignmentid,
       EXTRACT(YEAR FROM ga.creationtime) AS YEAR,
       EXTRACT(MONTH FROM ga.creationtime) AS MONTH,
       EXTRACT(DAY FROM ga.creationtime) AS DAY,
       EXTRACT(DOW FROM ga.creationtime) AS DOW,
       ga.creationtime,
       LEAD(ga.creationtime,1) OVER (PARTITION BY ga.ownerid ORDER BY ga.creationtime) AS nexttime,
       a.encodedid,
       a.name
FROM flx2.groupactivities ga
  JOIN flx2.memberstudytrackitemstatus mstis ON SPLIT_PART (SPLIT_PART (ga.activitydata,' ',2),',',1) = mstis.assignmentid
  JOIN flx2.artifacts a ON mstis.studytrackitemid = a.id
  JOIN auth.memberhasroles mhr ON mhr.memberid = ga.ownerid
  JOIN flx2.memberroles mr ON mr.id = mhr.roleid
WHERE ga.activitytype = 'assign'
AND   ga.ownerid NOT IN (SELECT memberid FROM auth.memberhasroles WHERE roleid = 25)
AND   a.artifacttypeid = 54
AND   a.encodedid IS NOT NULL
ORDER BY ga.ownerid,
         ga.creationtime,
         a.encodedid LIMIT 1000

The values in the nexttime column also seem to be jacked up. It seems to be taking the next value in the creationtime column on ocassion. For example: In the 2nd record, the value of nexttime column should've been 2013-09-18 06:14:59 instead of 2014-01-18 12:16:49

Why are we getting more records than expected? How do I fix these problems?

Upvotes: 3

Views: 155

Answers (3)

Patthebug
Patthebug

Reputation: 4787

So I finally figured out a way to achieve this. I made use of Dense_Rank() function and used the following query to get the result:

WITH t AS
(
  SELECT DISTINCT ga.ownerid,
         SPLIT_PART(SPLIT_PART(ga.activitydata,' ',2),',',1) AS Assignmentid,
         EXTRACT(YEAR FROM ga.creationtime) AS YEAR,
         EXTRACT(MONTH FROM ga.creationtime) AS MONTH,
         EXTRACT(DAY FROM ga.creationtime) AS DAY,
         EXTRACT(DOW FROM ga.creationtime) AS DOW,
         ga.creationtime,
         DENSE_RANK() OVER (PARTITION BY ga.ownerid ORDER BY ga.ownerid,ga.creationtime,a.encodedid) AS RowNum,
         a.encodedid,
         a.name
  FROM flx2.groupactivities ga
    JOIN flx2.memberstudytrackitemstatus mstis ON SPLIT_PART (SPLIT_PART (ga.activitydata,' ',2),',',1) = mstis.assignmentid
    JOIN flx2.artifacts a ON mstis.studytrackitemid = a.id
    JOIN auth.memberhasroles mhr ON mhr.memberid = ga.ownerid
    JOIN flx2.memberroles mr ON mr.id = mhr.roleid
  WHERE ga.activitytype = 'assign'
  AND   ga.ownerid NOT IN (SELECT memberid FROM auth.memberhasroles WHERE roleid = 25)
  AND   a.artifacttypeid = 54
  AND   a.encodedid IS NOT NULL
  ORDER BY ga.ownerid,
           ga.creationtime,
           RowNum,
           a.encodedid
)
SELECT top 100 t1.ownerid,
       t1.assignmentid,
       t1.year,
       t1.month,
       t1.day,
       t1.dow,
       t1.creationtime,
       t2.creationtime,
       datediff(day,t1.creationtime,t2.creationtime),
       t1.encodedid,
       t1.name
FROM t AS t1
  LEFT JOIN t AS t2
         ON t1.ownerid = t2.ownerid
        AND t1.rownum + 1 = t2.rownum
ORDER BY t1.ownerid,
         t1.creationtime,
         t1.rownum,
         t1.encodedid

This gave me the following:

enter image description here

Upvotes: 0

systemjack
systemjack

Reputation: 2985

Update: Does this look better?

with dataset as (
    SELECT DISTINCT ga.ownerid,
        mr.name,
        SPLIT_PART(SPLIT_PART(ga.activitydata,' ',2),',',1) AS Assignmentid,
        EXTRACT(YEAR FROM ga.creationtime) AS YEAR,
        EXTRACT(MONTH FROM ga.creationtime) AS MONTH,
        EXTRACT(DAY FROM ga.creationtime) AS DAY,
        EXTRACT(DOW FROM ga.creationtime) AS DOW,
        ga.creationtime,
        a.encodedid,
        a.name
    FROM flx2.groupactivities ga
    JOIN flx2.memberstudytrackitemstatus mstis ON SPLIT_PART (SPLIT_PART (ga.activitydata,' ',2),',',1) = mstis.assignmentid
    JOIN flx2.artifacts a ON mstis.studytrackitemid = a.id
    JOIN auth.memberhasroles mhr ON mhr.memberid = ga.ownerid
    JOIN flx2.memberroles mr ON mr.id = mhr.roleid
    WHERE ga.activitytype = 'assign'
        AND   ga.ownerid NOT IN (SELECT memberid FROM auth.memberhasroles WHERE roleid = 25)
        AND   a.artifacttypeid = 54
        AND   a.encodedid IS NOT NULL
)
select d.*,
    LEAD(creationtime,1) OVER (PARTITION BY ownerid ORDER BY creationtime) AS nexttime
from dataset d
ORDER BY ownerid, creationtime, encodedid, nextime
LIMIT 1000

Something like this (untested code) might work. Idea is to use the LEAD window function to get the creationtime of the following record for each owner, which will be null if it's the last record, and then use regular DATEDIFF to get the units you want. The CASE statement in the outer query handles the last record edge case and you can tweak that to get the result you want there.

select ownerid, creationtime,
    case when nextime is not null
        then datediff('second', creationtime, nextime)
        else datediff('second', creationtime, sysdate)
        end as timediff
from (
    select distinct ownerid, creationtime,
        lead(creationtime,1) over (partition by ownerid order by creationtime) as nexttime
    from yourdata
)

Upvotes: 2

Yuri G
Yuri G

Reputation: 1213

I personally see no declarative (pure SQL) way to achieve that. Sorry. You can't refer value in particular records (even whether it next or prev) in the set, and that is by nature.

So there's three ways I can see here:

1) Use Procedural Extension to SQL (MySQL has one too).

2) Get the whole set and process it externally, at the "client" (to RDBMS) side.

3) Add a timediff column to the table + AFTER INSERT/UPDATE trigger where you'd be calculating that difference and append the record with it.

Upvotes: 1

Related Questions