Omnus Ruthius
Omnus Ruthius

Reputation: 120

Oracle SQL Discrepancy: COUNT(*) vs Actual Result Set

I have an application that keeps track when a file is being “attempted” to move from one server to another, as well as when it has “succeeded” or “failed.” An "attempt" should always be paired with a "success" or "failure." However, there are 63 “orphaned” attempts, meaning there have been attempts without any success or failure reported. My first query shows where I got the 63 number to begin with: I take a count of all of the attempts and subtract the successes and failures-

SELECT
(
    SELECT COUNT(*) FROM e_table
    WHERE e_comment LIKE '%attempt%'
    AND e_date >= '23-MAY-2016'
    AND e_date <= '26-MAY-2016'
)
-
(
    SELECT
    (
        SELECT COUNT(*) FROM e_table
        WHERE e_comment LIKE '%success%'
        AND e_date >= '23-MAY-2016'
        AND e_date <= '26-MAY-2016'
    )
    +
    (
        SELECT COUNT(*) FROM e_table
        WHERE e_comment LIKE '%failure%'
        AND e_date >= '23-MAY-2016'
        AND e_date <= '26-MAY-2016'
    ) FROM dual
) AS orphaned_attempts FROM dual;

So my second query to get the specific e_id of the 63 attempts is as follows:

SELECT * FROM
(
    SELECT e_id FROM e_table
    WHERE e_comment LIKE '%attempt%'
    AND e_date >= '23-MAY-2016'
    AND e_date <= '26-MAY-2016'
)
MINUS
(
    SELECT e_id FROM e_table
    WHERE e_comment LIKE '%success%'
    AND e_date >= '23-MAY-2016'
    AND e_date <= '26-MAY-2016'
)
MINUS
(
    SELECT e_id FROM e_table
    WHERE e_comment LIKE '%failure%'
    AND e_date >= '23-MAY-2016'
    AND e_date <= '26-MAY-2016'
);

What I need (and expect based on the first query’s result set) is to have a 63-row result set with one column containing the e_id of the orphaned attempts. Instead, I am getting only 49 rows back from the second query. Any help is greatly appreciated.

Upvotes: 1

Views: 359

Answers (7)

MT0
MT0

Reputation: 168001

Oracle Setup:

CREATE TABLE e_comment ( ce_id, e_id, e_comment, e_date ) AS
SELECT '472', '125', 'is attempting to move',  TIMESTAMP '2016-05-23 09:49:10' FROM DUAL UNION ALL
SELECT '678', '125', 'is attempting to move',  TIMESTAMP '2016-05-23 11:37:09' FROM DUAL UNION ALL
SELECT '724', '125', 'has successfully moved', TIMESTAMP '2016-05-23 11:37:09' FROM DUAL UNION ALL
SELECT '983', '034', 'is attempting to move',  TIMESTAMP '2016-05-24 17:04:35' FROM DUAL UNION ALL
SELECT '643', '672', 'is attempting to move',  TIMESTAMP '2016-05-25 13:28:36' FROM DUAL UNION ALL
SELECT '026', '672', 'failed to move',         TIMESTAMP '2016-05-25 13:28:36' FROM DUAL UNION ALL
SELECT '087', '672', 'is attempting to move',  TIMESTAMP '2016-05-24 18:33:35' FROM DUAL UNION ALL
SELECT '921', '375', 'is attempting to move',  TIMESTAMP '2016-05-26 19:12:43' FROM DUAL UNION ALL
SELECT '345', '375', 'has successfully moved', TIMESTAMP '2016-05-26 19:12:43' FROM DUAL;

Query - Get the count of the attempts, successes and failures:

A much simpler query that does not require multiple self-joins:

SELECT e_id,
       COUNT( CASE WHEN e_comment LIKE '%attempt%' THEN 1 END ) AS attempts,
       COUNT( CASE WHEN e_comment LIKE '%success%' THEN 1 END ) AS successes,
       COUNT( CASE WHEN e_comment LIKE '%failed%' THEN 1 END ) AS failures
FROM   e_comment
GROUP BY e_id;

Output:

E_ID   ATTEMPTS  SUCCESSES   FAILURES
---- ---------- ---------- ----------
034           1          0          0 
672           2          0          1 
375           1          1          0 
125           2          1          0 

Query - Get the orphaned e_ids:

SELECT e_id,
       COUNT( CASE WHEN e_comment LIKE '%attempt%' THEN 1 END ) AS attempts,
       COUNT( CASE WHEN e_comment LIKE '%success%' THEN 1 END ) AS successes,
       COUNT( CASE WHEN e_comment LIKE '%failed%' THEN 1 END ) AS failures
FROM   e_comment
GROUP BY e_id
HAVING COUNT( CASE WHEN e_comment LIKE '%attempt%' THEN 1 END )
       <> COUNT( CASE WHEN e_comment LIKE '%success%' THEN 1 END )
        + COUNT( CASE WHEN e_comment LIKE '%failed%' THEN 1 END );

Output:

E_ID   ATTEMPTS  SUCCESSES   FAILURES
---- ---------- ---------- ----------
034           1          0          0 
672           2          0          1 
125           2          1          0 

Upvotes: 1

Thorsten Kettner
Thorsten Kettner

Reputation: 94913

Your count mismatch stems probably from having multiple attempts per e_id.

Example:

Counting:       count (1,1,2,3) - ( count(2) + count(3) ) = 4 - (1 + 1) = 2

Set operations:       (1,1,2,3) - (      (2) +      (3) ) = (1)

To get the desired IDs, you'd only have to group by them and compare counts, e.g.:

select e_id 
from e_table
where e_date between date'2016-05-23' and date'2016-05-26'
group by e_id
having count(case when e_comment like '%attempt%' then 1 end) >
       count(case when e_comment like '%success%' or e_comment like '%failure%' then 1 end);

Don't use strings for dates by the way, but use proper date literals as shown above. (And most of all: never use month names without a proper to_date specifying the matching nls_date_language.)

Upvotes: 3

shawnt00
shawnt00

Reputation: 17915

SELECT
    a.e_id,
    coalesce(attempts, 0) attempts,
    coalesce(successes, 0) successes,
    coalesce(failures, 0) failures
FROM
    (
        SELECT e_id, count(*) as attempts FROM e_table
        WHERE e_comment LIKE '%attempt%' AND e_date BETWEEN '23-MAY-2016' AND '26-MAY-2016'
        GROUP BY e_id
    ) a
    full outer join
    (
        SELECT e_id, count(*) as successes FROM e_table
        WHERE e_comment LIKE '%success%' AND e_date BETWEEN '23-MAY-2016' AND '26-MAY-2016'
        GROUP BY e_id
    ) s
        on s.e_id = a.e_id
    full outer join
    (
        SELECT e_id, count(*) as failures FROM e_table
        WHERE e_comment LIKE '%failure%' AND e_date BETWEEN '23-MAY-2016' AND '26-MAY-2016'
        GROUP BY e_id
    ) f
        on f.e_id = coalesce(a.e_id, s.e_id)
WHERE
    coalesce(attempts, 0) <> coalesce(successes, 0) + coalesce(failures, 0)

I changed to full outer joins so you can verify that there are no successes and/or failures without any matching attempt. This should let you find e_ids where something's going wrong in the logging. It should be easier to start dealing with finer numbers and not just listings of id values.

Others have pointed out the potential for multiple attempts on the same id but is it conceivable that a success and failure could both be recorded the same way as say in some kind of retry scenario? We don't know what the full comments look like. As a possible explanation, can a single comment can contain more than one of the words "attempt", "success", "failure"?

Here's something else to consider: Are you sure that all your success and failures events will fall within the same date window? In other words, is there some delay following the attempt? It might not have to be very long if this happens around midnight. You may want to widen the success and failure ranges enough to compensate for this (and change to left outer joins.)

Note: Condition in the where clause has been modified to allow for multiple attempts (as noted in comments) and now just looks for a balance in the number of attempts vs. successes and failures.

Upvotes: 5

David Aldridge
David Aldridge

Reputation: 52376

The query you need would be something like:

select e_id
from   e_table e1
where  e_comment like '%attempt%'
and    e_date between date '2016-05-23' and  date '2016-05-26'
and    not exists( select null
                   from   e_table e2
                   where  e2.e_id = e1.e_id
                   and    (e2.e_comment like '%success%' or 
                           e2.e_comment like '%failure%'))

The semantics of this seem to match your requirement most closely.

I left the date condition out of the correlated subquery to allow for successes and failures outside of the specified window, but it may help to add in an extra clause of the form:

and e2.e_date >= date '2016-05-23

... or ...

and e2.e_date >= e1.e_date

If you have an index on e_id and the cardinality is close to unique, then performance would not depend on that, however.

Upvotes: 2

Mottor
Mottor

Reputation: 1948

Better see what you have per e_id and then decide what to do ;)

SELECT e_id,
       count(*) c,
       sum(sign(instr(e_comment, 'attempt'))) c_a, 
       sum(sign(instr(e_comment, 'success'))) c_s, 
       sum(sign(instr(e_comment, 'failure'))) c_f
 FROM e_table
 WHERE e_date >= '23-MAY-2016' AND e_date <= '26-MAY-2016'
GROUP BY e_id

Upvotes: 1

Mr. Llama
Mr. Llama

Reputation: 20899

The following should work. It uses a common table expression for clarity, but it's not necessary and could be written with a regular LEFT OUTER JOIN.

WITH 
attempts AS (
    SELECT e_id
    FROM e_table
    WHERE e_comment LIKE '%attempt%'
),
results AS (
    SELECT e_id
    FROM e_table
    WHERE e_comment LIKE '%success%'
       OR e_comment LIKE '%failure%'
)
SELECT a.e_id
FROM attempts a
LEFT OUTER JOIN results r
  ON a.e_id = r.e_id
WHERE r.e_id IS NULL
  AND e_date >= '23-MAY-2016'
  AND e_date <= '26-MAY-2016';

It joins a list of the attempts to the list of successes/failures. If the attempt has no matching success/failure and it falls within the desired date range, the corresponding e_id is returned.

Upvotes: 0

ScaisEdge
ScaisEdge

Reputation: 133370

you can get all the row using select where in

   select * from e_table 
   where id in (SELECT * FROM
    (
        SELECT e_id FROM e_table
        WHERE e_comment LIKE '%attempt%'
        AND e_date >= '23-MAY-2016'
        AND e_date <= '26-MAY-2016'
    )
    MINUS
    (
        SELECT e_id FROM e_event
        WHERE e_comment LIKE '%success%'
        AND e_date >= '23-MAY-2016'
        AND e_date <= '26-MAY-2016'
    )
    MINUS
    (
        SELECT e_id FROM e_event
        WHERE e_comment LIKE '%failure%'
        AND e_date >= '23-MAY-2016'
        AND e_date <= '26-MAY-2016'
    ));

in this way should obtain all rows involved and easy chack for duplicated id ..

   select id, count(*) from e_table 
   where id in (SELECT * FROM
    (
        SELECT e_id FROM e_table
        WHERE e_comment LIKE '%attempt%'
        AND e_date >= '23-MAY-2016'
        AND e_date <= '26-MAY-2016'
    )
    MINUS
    (
        SELECT e_id FROM e_event
        WHERE e_comment LIKE '%success%'
        AND e_date >= '23-MAY-2016'
        AND e_date <= '26-MAY-2016'
    )
    MINUS
    (
        SELECT e_id FROM e_event
        WHERE e_comment LIKE '%failure%'
        AND e_date >= '23-MAY-2016'
        AND e_date <= '26-MAY-2016'
    ));
    group by id
    having count(*) >1;

in this way you obtain the id more that an entry

Upvotes: 0

Related Questions