Reputation: 12748
I have a bunch of items with dates/time.
[auto increment], [last_modified_date]
item1, 2013-sep-01 11:01:01
item2, 2013-sep-01 11:01:02
item3, 2013-sep-01 11:10:04
item4, 2013-sep-01 11:10:05
item5, 2013-sep-01 11:10:06
item6, 2013-sep-02 10:10:01
item7, 2013-sep-02 10:10:01
I need to group the items by date but the grouping must support an error threshold of a few seconds (let say 4 seconds). I would have this as a new results.
[auto increment], [last_modified_date]
group1, 2013-sep-01 11:01:01
group2, 2013-sep-01 11:10:04
group3, 2013-sep-02 10:10:01item1, 2013-sep-01 11:01:01, group1
item2, 2013-sep-01 11:01:02, group1
item3, 2013-sep-01 11:10:04, group2
item4, 2013-sep-01 11:10:05, group2
item5, 2013-sep-01 11:10:06, group2
item6, 2013-sep-02 10:10:01, group3
item7, 2013-sep-02 10:10:01, group3
I can easely do it by looping all items, check if a group exists and add a new group if it doesn't.
select coun(*) from group where abs((group.last_modified_date-item.last_modified_date)*24*60*60) < 4
But looping is pretty slow. Is there a way to get the dates to be inserted in the group table with a single query? Subqueries are fine, my goal is to remove the loop.
A sort of select last_modified_date from item group by last_modified_date but that also group dates that are near each other.
Upvotes: 0
Views: 124
Reputation: 7739
One trick you can use is to assign records to a group using analytic functions. You can use lag
in conjunction with case
to flag the first "member" of a group -- which is when the previous row's date is more than N seconds prior to the current row's date (I'm choosing N = 5 here). Then all the starting times for new groups are those records which are flagged:
with groups as (
select
auto_increment,
last_modified_date,
case when last_modified_date -
lag(last_modified_date, 1, date '1900-01-01')
over (order by last_modified_date) < (1 / 24 / 60 * 5) then 1 else 0
end as starts_new_group
from your_table
)
select
auto_increment,
last_modified_date
where
starts_new_group = 1;
NB I didn't test this code! If you can create a SQL Fiddle with data I can debug the query if necessary.
Upvotes: 2
Reputation: 34784
You could create a driver table via cte of the groupings, then JOIN
to that using BETWEEN
. I don't know oracle syntax, but this works in SQL Server, so hopefully you can adapt it:
DECLARE @begindate DATETIME = (SELECT MIN(last_modified_date) FROM #Table1)
,@enddate DATETIME = (SELECT MAX(last_modified_date) FROM #Table1)
;WITH cte AS (SELECT @begindate AS group_beg
,DATEADD(second,3,@begindate) AS group_end
, 1 AS Group_
UNION ALL
SELECT DATEADD(second,4,group_beg)
, DATEADD(second,7,group_beg)
, Group_ + 1
FROM cte
WHERE group_beg < @enddate
)
SELECT a.*,DENSE_RANK() OVER(ORDER BY b.Group_)
FROM #Table1 a
JOIN cte b
ON a.last_modified_date BETWEEN b.group_beg AND b.group_end
OPTION (MAXRECURSION 0)
Upvotes: 0