Reputation: 120
I have an application that keeps track when a file is being “attempted” to move from one server to another, as well as when it has “succeeded” or “failed.” An "attempt" should always be paired with a "success" or "failure." However, there are 63 “orphaned” attempts, meaning there have been attempts without any success or failure reported. My first query shows where I got the 63 number to begin with: I take a count of all of the attempts and subtract the successes and failures-
SELECT
(
SELECT COUNT(*) FROM e_table
WHERE e_comment LIKE '%attempt%'
AND e_date >= '23-MAY-2016'
AND e_date <= '26-MAY-2016'
)
-
(
SELECT
(
SELECT COUNT(*) FROM e_table
WHERE e_comment LIKE '%success%'
AND e_date >= '23-MAY-2016'
AND e_date <= '26-MAY-2016'
)
+
(
SELECT COUNT(*) FROM e_table
WHERE e_comment LIKE '%failure%'
AND e_date >= '23-MAY-2016'
AND e_date <= '26-MAY-2016'
) FROM dual
) AS orphaned_attempts FROM dual;
So my second query to get the specific e_id of the 63 attempts is as follows:
SELECT * FROM
(
SELECT e_id FROM e_table
WHERE e_comment LIKE '%attempt%'
AND e_date >= '23-MAY-2016'
AND e_date <= '26-MAY-2016'
)
MINUS
(
SELECT e_id FROM e_table
WHERE e_comment LIKE '%success%'
AND e_date >= '23-MAY-2016'
AND e_date <= '26-MAY-2016'
)
MINUS
(
SELECT e_id FROM e_table
WHERE e_comment LIKE '%failure%'
AND e_date >= '23-MAY-2016'
AND e_date <= '26-MAY-2016'
);
What I need (and expect based on the first query’s result set) is to have a 63-row result set with one column containing the e_id of the orphaned attempts. Instead, I am getting only 49 rows back from the second query. Any help is greatly appreciated.
Upvotes: 1
Views: 359
Reputation: 168001
Oracle Setup:
CREATE TABLE e_comment ( ce_id, e_id, e_comment, e_date ) AS
SELECT '472', '125', 'is attempting to move', TIMESTAMP '2016-05-23 09:49:10' FROM DUAL UNION ALL
SELECT '678', '125', 'is attempting to move', TIMESTAMP '2016-05-23 11:37:09' FROM DUAL UNION ALL
SELECT '724', '125', 'has successfully moved', TIMESTAMP '2016-05-23 11:37:09' FROM DUAL UNION ALL
SELECT '983', '034', 'is attempting to move', TIMESTAMP '2016-05-24 17:04:35' FROM DUAL UNION ALL
SELECT '643', '672', 'is attempting to move', TIMESTAMP '2016-05-25 13:28:36' FROM DUAL UNION ALL
SELECT '026', '672', 'failed to move', TIMESTAMP '2016-05-25 13:28:36' FROM DUAL UNION ALL
SELECT '087', '672', 'is attempting to move', TIMESTAMP '2016-05-24 18:33:35' FROM DUAL UNION ALL
SELECT '921', '375', 'is attempting to move', TIMESTAMP '2016-05-26 19:12:43' FROM DUAL UNION ALL
SELECT '345', '375', 'has successfully moved', TIMESTAMP '2016-05-26 19:12:43' FROM DUAL;
Query - Get the count of the attempts, successes and failures:
A much simpler query that does not require multiple self-joins:
SELECT e_id,
COUNT( CASE WHEN e_comment LIKE '%attempt%' THEN 1 END ) AS attempts,
COUNT( CASE WHEN e_comment LIKE '%success%' THEN 1 END ) AS successes,
COUNT( CASE WHEN e_comment LIKE '%failed%' THEN 1 END ) AS failures
FROM e_comment
GROUP BY e_id;
Output:
E_ID ATTEMPTS SUCCESSES FAILURES
---- ---------- ---------- ----------
034 1 0 0
672 2 0 1
375 1 1 0
125 2 1 0
Query - Get the orphaned e_id
s:
SELECT e_id,
COUNT( CASE WHEN e_comment LIKE '%attempt%' THEN 1 END ) AS attempts,
COUNT( CASE WHEN e_comment LIKE '%success%' THEN 1 END ) AS successes,
COUNT( CASE WHEN e_comment LIKE '%failed%' THEN 1 END ) AS failures
FROM e_comment
GROUP BY e_id
HAVING COUNT( CASE WHEN e_comment LIKE '%attempt%' THEN 1 END )
<> COUNT( CASE WHEN e_comment LIKE '%success%' THEN 1 END )
+ COUNT( CASE WHEN e_comment LIKE '%failed%' THEN 1 END );
Output:
E_ID ATTEMPTS SUCCESSES FAILURES
---- ---------- ---------- ----------
034 1 0 0
672 2 0 1
125 2 1 0
Upvotes: 1
Reputation: 94913
Your count mismatch stems probably from having multiple attempts per e_id.
Example:
Counting: count (1,1,2,3) - ( count(2) + count(3) ) = 4 - (1 + 1) = 2
Set operations: (1,1,2,3) - ( (2) + (3) ) = (1)
To get the desired IDs, you'd only have to group by them and compare counts, e.g.:
select e_id
from e_table
where e_date between date'2016-05-23' and date'2016-05-26'
group by e_id
having count(case when e_comment like '%attempt%' then 1 end) >
count(case when e_comment like '%success%' or e_comment like '%failure%' then 1 end);
Don't use strings for dates by the way, but use proper date literals as shown above. (And most of all: never use month names without a proper to_date
specifying the matching nls_date_language
.)
Upvotes: 3
Reputation: 17915
SELECT
a.e_id,
coalesce(attempts, 0) attempts,
coalesce(successes, 0) successes,
coalesce(failures, 0) failures
FROM
(
SELECT e_id, count(*) as attempts FROM e_table
WHERE e_comment LIKE '%attempt%' AND e_date BETWEEN '23-MAY-2016' AND '26-MAY-2016'
GROUP BY e_id
) a
full outer join
(
SELECT e_id, count(*) as successes FROM e_table
WHERE e_comment LIKE '%success%' AND e_date BETWEEN '23-MAY-2016' AND '26-MAY-2016'
GROUP BY e_id
) s
on s.e_id = a.e_id
full outer join
(
SELECT e_id, count(*) as failures FROM e_table
WHERE e_comment LIKE '%failure%' AND e_date BETWEEN '23-MAY-2016' AND '26-MAY-2016'
GROUP BY e_id
) f
on f.e_id = coalesce(a.e_id, s.e_id)
WHERE
coalesce(attempts, 0) <> coalesce(successes, 0) + coalesce(failures, 0)
I changed to full outer joins so you can verify that there are no successes and/or failures without any matching attempt. This should let you find e_id
s where something's going wrong in the logging. It should be easier to start dealing with finer numbers and not just listings of id values.
Others have pointed out the potential for multiple attempts on the same id but is it conceivable that a success and failure could both be recorded the same way as say in some kind of retry scenario? We don't know what the full comments look like. As a possible explanation, can a single comment can contain more than one of the words "attempt", "success", "failure"?
Here's something else to consider: Are you sure that all your success and failures events will fall within the same date window? In other words, is there some delay following the attempt? It might not have to be very long if this happens around midnight. You may want to widen the success and failure ranges enough to compensate for this (and change to left outer joins.)
Note: Condition in the where
clause has been modified to allow for multiple attempts (as noted in comments) and now just looks for a balance in the number of attempts vs. successes and failures.
Upvotes: 5
Reputation: 52376
The query you need would be something like:
select e_id
from e_table e1
where e_comment like '%attempt%'
and e_date between date '2016-05-23' and date '2016-05-26'
and not exists( select null
from e_table e2
where e2.e_id = e1.e_id
and (e2.e_comment like '%success%' or
e2.e_comment like '%failure%'))
The semantics of this seem to match your requirement most closely.
I left the date condition out of the correlated subquery to allow for successes and failures outside of the specified window, but it may help to add in an extra clause of the form:
and e2.e_date >= date '2016-05-23
... or ...
and e2.e_date >= e1.e_date
If you have an index on e_id and the cardinality is close to unique, then performance would not depend on that, however.
Upvotes: 2
Reputation: 1948
Better see what you have per e_id and then decide what to do ;)
SELECT e_id,
count(*) c,
sum(sign(instr(e_comment, 'attempt'))) c_a,
sum(sign(instr(e_comment, 'success'))) c_s,
sum(sign(instr(e_comment, 'failure'))) c_f
FROM e_table
WHERE e_date >= '23-MAY-2016' AND e_date <= '26-MAY-2016'
GROUP BY e_id
Upvotes: 1
Reputation: 20899
The following should work. It uses a common table expression for clarity, but it's not necessary and could be written with a regular LEFT OUTER JOIN
.
WITH
attempts AS (
SELECT e_id
FROM e_table
WHERE e_comment LIKE '%attempt%'
),
results AS (
SELECT e_id
FROM e_table
WHERE e_comment LIKE '%success%'
OR e_comment LIKE '%failure%'
)
SELECT a.e_id
FROM attempts a
LEFT OUTER JOIN results r
ON a.e_id = r.e_id
WHERE r.e_id IS NULL
AND e_date >= '23-MAY-2016'
AND e_date <= '26-MAY-2016';
It joins a list of the attempts to the list of successes/failures. If the attempt has no matching success/failure and it falls within the desired date range, the corresponding e_id
is returned.
Upvotes: 0
Reputation: 133370
you can get all the row using select where in
select * from e_table
where id in (SELECT * FROM
(
SELECT e_id FROM e_table
WHERE e_comment LIKE '%attempt%'
AND e_date >= '23-MAY-2016'
AND e_date <= '26-MAY-2016'
)
MINUS
(
SELECT e_id FROM e_event
WHERE e_comment LIKE '%success%'
AND e_date >= '23-MAY-2016'
AND e_date <= '26-MAY-2016'
)
MINUS
(
SELECT e_id FROM e_event
WHERE e_comment LIKE '%failure%'
AND e_date >= '23-MAY-2016'
AND e_date <= '26-MAY-2016'
));
in this way should obtain all rows involved and easy chack for duplicated id ..
select id, count(*) from e_table
where id in (SELECT * FROM
(
SELECT e_id FROM e_table
WHERE e_comment LIKE '%attempt%'
AND e_date >= '23-MAY-2016'
AND e_date <= '26-MAY-2016'
)
MINUS
(
SELECT e_id FROM e_event
WHERE e_comment LIKE '%success%'
AND e_date >= '23-MAY-2016'
AND e_date <= '26-MAY-2016'
)
MINUS
(
SELECT e_id FROM e_event
WHERE e_comment LIKE '%failure%'
AND e_date >= '23-MAY-2016'
AND e_date <= '26-MAY-2016'
));
group by id
having count(*) >1;
in this way you obtain the id more that an entry
Upvotes: 0