Reputation: 59

how to calculate total time between begin and end timestamps across multiple records in postgresql

I have a postgresql 9.1 table with begin and end timestamps that record time periods that may overlap for a userid. Example:

Userid   Begin                     End
1        2014-01-19 21:14:59+00    2014-01-19 21:30:00+00
2        2014-01-19 21:19:29+00    2014-01-19 21:40:30+00
1        2014-01-19 21:16:29+00    2014-01-19 21:31:30+00
3        2014-01-19 21:15:22+00    2014-01-19 21:30:29+00
2        2014-01-19 21:29:59+00    2014-01-19 21:45:00+00
3        2014-01-19 21:15:25+00    2014-01-19 21:35:10+00
4        2014-01-19 22:00:01+00    2014-01-19 22:05:20+00

I need to get the sum of all minutes per Userid across these rows ensuring that time is not double counted for the individual Userid, so that the output would be:

Userid   Hour      Total Minutes
1        21        n
2        21        n
3        21        n
4        22        n

Upvotes: 2

Answers (1)

Gordon Linoff

Reputation: 1270573

This is a hard problem, but at least Postgres has lag() functionality.

Here is the idea. Assuming that you have overlaps. Find the first element that has no overlap. Give it a flag value of 1. Then do a cumulative sum of this value. The result is that the different time periods are now assigned values according to their "islands". Then simple aggregation works.

The following will work in many circumstances, using lag():

select userid, sum(secs) / 60 as minutes
from (select userid, Island, min(begin) as begin, max(end) as end,
             extract(epoch from max(end) - min(begin)) as secs
      from (select t.*,
                   sum(IslandBegin) over (partition by userid order by begin) as Island
            from (select t.*,
                         (case when lag(end) over (partition by userid order by begin) >= begin
                               then 0
                               else 1
                          end) as IslandBegin
                  from table t
                 ) t
           ) t
      group by userid, Island
     ) t
group by userid;

Note that end is a reserved word, so adjust the code accordingly.

The above does not always work. It assumes that the overlap is with the previous begin, and that might not be the case. Consider {(1, 100), (2, 5), (8, 10)}. I think the right logic still requires a correlated subquery. The innermost query needs to change from:

            from (select t.*,
                         (case when lag(end) over (partition by userid order by begin) >= begin
                               then 0
                               else 1
                          end) as IslandBegin
                  from table t
                 ) t

to:

            from (select t.*,
                         coalesce((select 1
                                   from table t2
                                   where t2.end >= t.begin and
                                         t2.begin < t.begin
                                   limit 1
                                  ), 0
                                 ) as IslandBegin
                  from table t
                 ) t

Actually, that wasn't so bad, and I learned that lag() can't be used for all such cases.

Upvotes: 1

how to calculate total time between begin and end timestamps across multiple records in postgresql

Answers (1)

Related Questions