matt124
matt124

Reputation:

sql query - select duplicates within a 12 hour period

if i have data as follows

A | 01/01/2008 00:00:00

B | 01/01/2008 01:00:00

A | 01/01/2008 11:59:00

C | 02/01/2008 00:00:00

D | 02/01/2008 01:00:00

D | 02/01/2008 20:00:00

I want to only select the records whose identifiers (A, B, C or D) have occured twice within a 12 hour period. In this example above this would only be 'A'

Can anyone help please (this is for an Oracle data base)

Thanks

M

Upvotes: 3

Views: 3528

Answers (6)

Charles Bretana
Charles Bretana

Reputation: 146499

 Select Distinct A.Identifer  
  From Table A
    Join Table B --  EDIT to eliminate self Joins (to same row)
        On A.PrimKey <> B.PrimaryKey
           And A.Identifer = B.Identifer               
           -- EDIT to fix case where 2 at same time
           And A.OccurTime >=  B.OccurTime 
           And A.OccurTime < B.OccurTime + .5

and to implement question asked in comment, (Ignoring records which are on different days)

-- for SQL Server,

   Select Distinct A.Identifer  
    From Table A
      Join Table B 
        On A.PrimKey <> B.PrimaryKey
           And A.Identifer = B.Identifer
           -- EDIT to fix case where 2 at same time
           And A.OccurTime >= B.OccurTime  
           And A.OccurTime < B.OccurTime + .5
    Where DateDiff(day, A.OccurTime, B.OccurTime) = 0

-- or for oracle...

 Select Distinct A.Identifer  
    From Table A
      Join Table B 
        On A.PrimKey <> B.PrimaryKey
           And A.Identifer = B.Identifer
           -- EDIT to fix case where 2 at same time
           And A.OccurTime >= B.OccurTime  
           And A.OccurTime < B.OccurTime + .5
 Where Trunc(A.OccurTime) = Trunc(B.OccurTime)

Upvotes: 7

hamishmcn
hamishmcn

Reputation: 7981

SELECT namecol FROM tbl A
WHERE EXISTS (
  SELECT 1 from tbl B
  WHERE b.namecol = a.namecol
  AND b.timestamp > a.timestamp
  AND b.timestamp - 0.5 <= a.timestamp )

Upvotes: 0

Mark Brady
Mark Brady

Reputation:

I haven't checked William's query but I would seriously consider using what he has over every other. Analytics are da bomb. Anytime you find yourself joining a table back to itself is virtually guaranteed to be an opportunity to use analytics and will out perform the query with one table referenced twice every time.

You'll be amazed how much faster the analytic solution will be.

Upvotes: 2

William
William

Reputation: 801

I'm not 100% sure of your requirements, however this might give you some ideas about how to do what you need. For example you said exactly 2; what if there are 3 occurances? etc.

create table t (ident varchar2(16), occurance timestamp);

insert into t (ident, occurance) values ('a', to_date('20080101000000', 'yyyymmddhh24miss'));

insert into t (ident, occurance) values ('b', to_date('20080101010000', 'yyyymmddhh24miss'));

insert into t (ident, occurance) values ('a', to_date('20080101115900', 'yyyymmddhh24miss'));

insert into t (ident, occurance) values ('c', to_date('20080102000000', 'yyyymmddhh24miss'));

insert into t (ident, occurance) values ('d', to_date('20080102010000', 'yyyymmddhh24miss'));

insert into t (ident, occurance) values ('d', to_date('20080102200000', 'yyyymmddhh24miss'));

insert into t (ident, occurance) values ('d', to_date('20080103020000', 'yyyymmddhh24miss'));

select ident, occurance
from
(
select ident, occurance,
    lag(occurance) over (partition by ident order by occurance) previous, 
    lead(occurance) over (partition by ident order by occurance) next 
from t
)
where 
    ((occurance-previous<interval'12:00' hour to minute and extract(day from occurance) = extract(day from previous))
    or (next-occurance<interval'12:00' hour to minute and extract(day from occurance) = extract(day from next)))
/

Upvotes: 1

dmajkic
dmajkic

Reputation: 3488

Select    
  A.Id
From    
  YourTable A 
Where    
  A.YourDateTime Between :StartDateTime and :EndDateTime 
Group By    
  A.Id
Having    
  COUNT(A.Id) = 2

Upvotes: 2

Justin Cave
Justin Cave

Reputation: 231671

SELECT identifier
  FROM table_name outer
 WHERE EXISTS( SELECT 1
                 FROM table_name inner
                WHERE inner.identifier  = outer.identifier
                  AND inner.date_column BETWEEN outer.date_column AND outer.date_column + interval '12' hour
                  AND inner.rowid != outer.rowid )

Upvotes: 1

Related Questions