Reputation: 125
Suppose I have the following table:
User_ID Activity_ID
123 222
123 333
124 222
124 224
124 333
125 224
125 333
I want to return a count users by the different combinations of overlaps such as the following:
Activity_ID_1 Activity_ID_2 Count_of_Users
222 333 2
222 224 2
In the above example, there are 2 users who completed both 223 AND 333.
I do not want to define each combination manually since there are 93 different activity_ids I am working with. Is there a way to do this purely in Oracle SQL?
Upvotes: 0
Views: 86
Reputation: 168281
Oracle Setup:
CREATE TABLE data ( User_ID, Activity_ID ) AS
SELECT 123, 222 FROM DUAL UNION ALL
SELECT 123, 333 FROM DUAL UNION ALL
SELECT 124, 222 FROM DUAL UNION ALL
SELECT 124, 224 FROM DUAL UNION ALL
SELECT 124, 333 FROM DUAL UNION ALL
SELECT 125, 224 FROM DUAL UNION ALL
SELECT 125, 333 FROM DUAL;
CREATE TYPE INTLIST AS TABLE OF INT;
/
Query:
WITH Activities ( User_IDs, Activity_ID ) AS (
SELECT CAST( COLLECT( User_ID ) AS INTLIST ),
Activity_ID
FROM data
GROUP BY Activity_ID
)
SELECT a.Activity_ID,
b.Activity_ID,
CARDINALITY( a.User_IDs MULTISET INTERSECT b.User_IDs ) AS "Count"
FROM Activities a
INNER JOIN
Activities b
ON ( CARDINALITY( a.User_IDs MULTISET INTERSECT b.User_IDs ) > 1
AND a.Activity_ID < b.Activity_ID );
Output:
ACTIVITY_ID ACTIVITY_ID Count
----------- ----------- ----------
222 333 2
224 333 2
Upvotes: 0
Reputation:
Assuming you have an activity
table with activity id's, and you want to count only DISTINCT users who had the same two activities (the same user having both activities twice wouldn't count):
select a1.activity_id, a2.activity_id, count(distinct f.user_id)
from activity a1 inner join facts f on a1.activity_id = f.activity_id
inner join activity a2 on a2.activity_id = f.activity_id
where a1.activity_id < a2.activity_id
group by a1.activity_id, a2.activity_id
having count(distinct f.user_id) >= 2
;
facts
is the name of your facts table (the one you show in your question).
EDIT: If the facts
table (or view or subquery or whatever) is already "distinct"-ed by user_id, then delete "distinct" from my solution; this will make it more efficient. NOTE: "distinct" appears twice, once in SELECT and again in HAVING.
Upvotes: 1