the_lotus
the_lotus

Reputation: 12748

Grouping by dates with an error threshold

I have a bunch of items with dates/time.

[auto increment], [last_modified_date]
item1, 2013-sep-01 11:01:01
item2, 2013-sep-01 11:01:02
item3, 2013-sep-01 11:10:04
item4, 2013-sep-01 11:10:05
item5, 2013-sep-01 11:10:06
item6, 2013-sep-02 10:10:01
item7, 2013-sep-02 10:10:01

I need to group the items by date but the grouping must support an error threshold of a few seconds (let say 4 seconds). I would have this as a new results.

[auto increment], [last_modified_date]
group1, 2013-sep-01 11:01:01
group2, 2013-sep-01 11:10:04
group3, 2013-sep-02 10:10:01

item1, 2013-sep-01 11:01:01, group1
item2, 2013-sep-01 11:01:02, group1
item3, 2013-sep-01 11:10:04, group2
item4, 2013-sep-01 11:10:05, group2
item5, 2013-sep-01 11:10:06, group2
item6, 2013-sep-02 10:10:01, group3
item7, 2013-sep-02 10:10:01, group3

I can easely do it by looping all items, check if a group exists and add a new group if it doesn't.
select coun(*) from group where abs((group.last_modified_date-item.last_modified_date)*24*60*60) < 4

But looping is pretty slow. Is there a way to get the dates to be inserted in the group table with a single query? Subqueries are fine, my goal is to remove the loop.

A sort of select last_modified_date from item group by last_modified_date but that also group dates that are near each other.

Upvotes: 0

Views: 124

Answers (2)

Colin &#39;t Hart
Colin &#39;t Hart

Reputation: 7739

One trick you can use is to assign records to a group using analytic functions. You can use lag in conjunction with case to flag the first "member" of a group -- which is when the previous row's date is more than N seconds prior to the current row's date (I'm choosing N = 5 here). Then all the starting times for new groups are those records which are flagged:

with groups as (
  select
    auto_increment,
    last_modified_date,
    case when last_modified_date -
      lag(last_modified_date, 1, date '1900-01-01')
        over (order by last_modified_date) < (1 / 24 / 60 * 5) then 1 else 0
    end as starts_new_group
  from your_table
)
select
  auto_increment,
  last_modified_date
where
  starts_new_group = 1;

NB I didn't test this code! If you can create a SQL Fiddle with data I can debug the query if necessary.

Upvotes: 2

Hart CO
Hart CO

Reputation: 34784

You could create a driver table via cte of the groupings, then JOIN to that using BETWEEN. I don't know oracle syntax, but this works in SQL Server, so hopefully you can adapt it:

DECLARE @begindate DATETIME = (SELECT MIN(last_modified_date) FROM #Table1)
       ,@enddate DATETIME = (SELECT MAX(last_modified_date) FROM #Table1)
;WITH cte AS (SELECT @begindate AS group_beg
                    ,DATEADD(second,3,@begindate) AS group_end
                    , 1 AS Group_
              UNION  ALL
              SELECT DATEADD(second,4,group_beg)
                   , DATEADD(second,7,group_beg)
                   , Group_ + 1
              FROM cte
              WHERE group_beg < @enddate
             )
SELECT a.*,DENSE_RANK() OVER(ORDER BY b.Group_)
FROM #Table1 a
JOIN cte b
  ON a.last_modified_date BETWEEN b.group_beg AND b.group_end
OPTION (MAXRECURSION 0)

Upvotes: 0

Related Questions