debug_04
debug_04

Reputation: 47

Select records all within 10 minutes from each other

I have some data coming from a source in my Oracle database.

If a particular Office_ID has been deactivated and it has all three clients (A,B,C) for a particular day, then we have to check whether all clients have gone. If yes, then we need to check whether timeframe for all clients is within 10 Minutes.

If this repeats three times in a day for a particular office we declare the office as closed.

Here is some sample data:

+-----------+-----------+--------------+--------+
| OFFICE_ID | FAIL_TIME | ACTIVITY_DAY | CLIENT |
| 1002      | 5:39:00   | 23/01/2015   | A      |
| 1002      | 17:49:00  | 23/12/2014   | A      |
| 1002      | 18:41:57  | 1/5/2014     | B      |
| 1002      | 10:32:00  | 1/7/2014     | A      |
| 1002      | 10:34:23  | 1/7/2014     | B      |
| 1002      | 10:35:03  | 1/7/2014     | C      |
| 1002      | 12:08:52  | 1/7/2014     | B      |
| 1002      | 12:09:00  | 1/7/2014     | A      |
| 1002      | 12:26:10  | 1/7/2014     | B      |
| 1002      | 13:31:32  | 1/7/2014     | B      |
| 1002      | 15:24:06  | 1/7/2014     | B      |
| 1002      | 15:55:06  | 1/7/2014     | C      |
+-----------+-----------+--------------+--------+

The result should be like this:

1002    10:32:00      A
1002    10:34:23      B
1002    10:35:03      C

Any help would be appreciated. I am looking for a SQL query or a PL/SQL procedure.

Upvotes: 2

Views: 1779

Answers (3)

MT0
MT0

Reputation: 167822

A solution using the COUNT analytic function with a RANGE BETWEEN INTERVAL '10' MINUTE PRECEDING AND INTERVAL '10' MINUTE FOLLOWING that avoids self-joins:

SQL Fiddle

Oracle 11g R2 Schema Setup:

CREATE TABLE Test ( OFFICE_ID, FAIL_TIME, ACTIVITY_DAY, CLIENT ) AS
          SELECT 1002,  '5:39:00', '23/01/2015', 'A' FROM DUAL
UNION ALL SELECT 1002, '17:49:00', '23/12/2014', 'A' FROM DUAL
UNION ALL SELECT 1002, '18:41:57', '1/5/2014', 'B' FROM DUAL
UNION ALL SELECT 1002, '10:32:00', '1/7/2014', 'A' FROM DUAL
UNION ALL SELECT 1002, '10:34:23', '1/7/2014', 'B' FROM DUAL
UNION ALL SELECT 1002, '10:35:03', '1/7/2014', 'C' FROM DUAL
UNION ALL SELECT 1002, '12:08:52', '1/7/2014', 'B' FROM DUAL
UNION ALL SELECT 1002, '12:09:00', '1/7/2014', 'A' FROM DUAL
UNION ALL SELECT 1002, '12:26:10', '1/7/2014', 'B' FROM DUAL
UNION ALL SELECT 1002, '13:31:32', '1/7/2014', 'B' FROM DUAL
UNION ALL SELECT 1002, '15:24:06', '1/7/2014', 'B' FROM DUAL
UNION ALL SELECT 1002, '15:55:06', '1/7/2014', 'C' FROM DUAL

Query 1:

WITH Times AS (
  SELECT OFFICE_ID,
         TO_DATE( ACTIVITY_DAY || ' ' || FAIL_TIME, 'DD/MM/YYYY HH24/MI/SS' ) AS FAIL_DATETIME,
         CLIENT
  FROM   Test
),
Next_Times As (
  SELECT OFFICE_ID,
         FAIL_DATETIME,
         COUNT( CASE CLIENT WHEN 'A' THEN 1 END ) OVER ( PARTITION BY OFFICE_ID ORDER BY FAIL_DATETIME RANGE BETWEEN INTERVAL '10' MINUTE PRECEDING AND INTERVAL '10' MINUTE FOLLOWING ) AS COUNT_A,
         COUNT( CASE CLIENT WHEN 'B' THEN 1 END ) OVER ( PARTITION BY OFFICE_ID ORDER BY FAIL_DATETIME RANGE BETWEEN INTERVAL '10' MINUTE PRECEDING AND INTERVAL '10' MINUTE FOLLOWING ) AS COUNT_B,
         COUNT( CASE CLIENT WHEN 'C' THEN 1 END ) OVER ( PARTITION BY OFFICE_ID ORDER BY FAIL_DATETIME RANGE BETWEEN INTERVAL '10' MINUTE PRECEDING AND INTERVAL '10' MINUTE FOLLOWING ) AS COUNT_C
  FROM   Times
)
SELECT OFFICE_ID,
       TO_CHAR( FAIL_DATETIME, 'HH24:MI:SS' ) AS FAIL_TIME,
       TO_CHAR( FAIL_DATETIME, 'DD/MM/YYYY' ) AS ACTIVITY_DAY       
FROM   Next_Times
WHERE  COUNT_A > 0
AND    COUNT_B > 0
AND    COUNT_C > 0
ORDER BY FAIL_DATETIME

Results:

| OFFICE_ID | FAIL_TIME | ACTIVITY_DAY |
|-----------|-----------|--------------|
|      1002 |  10:32:00 |   01/07/2014 |
|      1002 |  10:34:23 |   01/07/2014 |
|      1002 |  10:35:03 |   01/07/2014 |

Upvotes: 1

APC
APC

Reputation: 146199

The first thing we need is a way of comparing FAIL_TIME. As you haven't posted a table structure let's assume we're dealing with strings.

Oracle has some neat built-ins for casting dates and strings. If we concatenate ACTIVITY_DATE and FAIL_TIME we can convert them to a DATE data type:

to_date(ACTIVITY_DAY||' '||FAIL_TIME, 'dd/mm/yyyy hh24:mi:ss')

We can cast that to a string representing the number of seconds past midnight:

to_char(to_date(ACTIVITY_DAY||' '||FAIL_TIME, 'dd/mm/yyyy hh24:mi:ss'), 'sssss')

Then we can cast that to a number, which we can use in some arithmetic to compare with other rows; ten minutes = 600 seconds.

Next we can use the subquery factoring (the WITH clause). One of the neat features of this syntax is that we can pass the output of one subquery into another one, so we only only need to write that gnarly nested cast expression once.

with t as
    ( select OFFICE_ID
               , ACTIVITY_DAY
               , FAIL_TIME
               , to_number(to_char(to_date(ACTIVITY_DAY||' '||FAIL_TIME, 'dd/mm/yyyy hh24:mi:ss'), 'sssss')) FAIL_TIME_SSSSS
               , CLIENT
      from faillog
    )

We can use this sub-query to build other subqueries which separate the table's rows into sets for each CLIENT for use in our main query.

Finally we can use an analytic COUNT() function to track how many bunches of FAIL_TIME we have for each OFFICE and ACTIVITY_DATE combo.

count(*) over (partition by a.OFFICE_ID, a.ACTIVITY_DAY) 

Putting it all together in an in-line view allows us to test for whether we can "declare the office as closed".

select * from (
    with t as ( select OFFICE_ID
                       , ACTIVITY_DAY
                       , FAIL_TIME
                       , to_number(to_char(to_date(ACTIVITY_DAY||' '||FAIL_TIME, 'dd/mm/yyyy hh24:mi:ss'), 'sssss')) FAIL_TIME_SSSSS
                       , CLIENT
                from faillog
                )
         , a as (select *
                  from t
                  where CLIENT = 'A' )
         , b as (select *
                  from t
                  where CLIENT = 'B' )
         , c as (select *
                  from t
                  where CLIENT = 'C' )
    select a.OFFICE_ID
           , a.ACTIVITY_DAY 
           , a.FAIL_TIME as a_fail_time
           , b.FAIL_TIME as b_fail_time
           , c.FAIL_TIME as a_fail_time
           , count(*) over (partition by a.OFFICE_ID, a.ACTIVITY_DAY) as fail_count
    from a 
         join b on a.OFFICE_ID = b.OFFICE_ID and a.ACTIVITY_DAY = b.ACTIVITY_DAY
         join c on a.OFFICE_ID = c.OFFICE_ID and a.ACTIVITY_DAY = c.ACTIVITY_DAY
    where a.FAIL_TIME_SSSSS between b.FAIL_TIME_SSSSS - 600 and b.FAIL_TIME_SSSSS + 600
    and   a.FAIL_TIME_SSSSS between c.FAIL_TIME_SSSSS - 600 and c.FAIL_TIME_SSSSS + 600
    and   b.FAIL_TIME_SSSSS between a.FAIL_TIME_SSSSS - 600 and a.FAIL_TIME_SSSSS + 600
    and   b.FAIL_TIME_SSSSS between c.FAIL_TIME_SSSSS - 600 and c.FAIL_TIME_SSSSS + 600
    and   c.FAIL_TIME_SSSSS between a.FAIL_TIME_SSSSS - 600 and a.FAIL_TIME_SSSSS + 600
    and   c.FAIL_TIME_SSSSS between b.FAIL_TIME_SSSSS - 600 and b.FAIL_TIME_SSSSS + 600
)
where fail_count >= 3
/

Notes

  1. Obviously I have hard-coded the CLIENT identifier in the subqueries. It would be possible to avoid the hard-coding, but the sample query is already complicated enough.
  2. This query doesn't search for triplets. Providing there is one failure for each of A, B and C within a ten minute window it doesn't matter how many instances of each CLIENT occur within the window. There's nothing in your business rules to say this is wrong.
  3. Similarly, the same instance of one CLIENT can be matched with instances of other CLIENTs in overlapping windows. Now this may be undesirable: double or triple counting may inflate the FAIL_COUNT. But again, handling this will make the final query more complicated.
  4. The query as presented has one row for each distinct combo of A, B and C FAIL_TIME values. The result set can be pivoted if you really need a row for each CLIENT/FAIL_TIME.

Upvotes: 0

Bulat
Bulat

Reputation: 6969

To identify records you can join table to it self three times like this:

SELECT
  a.*, b.*, c.*
FROM FailLog a INNER JOIN
     FailLog b ON b.OFFICE_ID = A.OFFICE_ID AND 
          a.CLIENT = 'A' AND 
          b.CLIENT = 'B' AND
          b.ACTIVITY_DAY = a.ACTIVITY_DAY INNER JOIN
     FailLog c ON c.OFFICE_ID = A.OFFICE_ID AND 
          c.CLIENT = 'C' AND 
          c.ACTIVITY_DAY = a.ACTIVITY_DAY AND
          -- need to calculate difference in min here
          GREATEST (a.FAIL_TIME, b. FAIL_TIME, c. FAIL_TIME) - 
          LEAST (a.FAIL_TIME, b. FAIL_TIME, c. FAIL_TIME) <= 10 

The output will give you one row instead of three as requested in the question, but that will be the right level for the fault data, as all three clients should have it.

Upvotes: 0

Related Questions