datatista
datatista

Reputation: 125

Oracle: How to find overlaps in rows

Suppose I have the following table:

User_ID Activity_ID
123     222
123     333
124     222
124     224
124     333
125     224
125     333

I want to return a count users by the different combinations of overlaps such as the following:

Activity_ID_1 Activity_ID_2 Count_of_Users
222           333           2 
222           224           2

In the above example, there are 2 users who completed both 223 AND 333.

I do not want to define each combination manually since there are 93 different activity_ids I am working with. Is there a way to do this purely in Oracle SQL?

Upvotes: 0

Views: 86

Answers (2)

MT0
MT0

Reputation: 168281

Oracle Setup:

CREATE TABLE data ( User_ID, Activity_ID ) AS
SELECT 123, 222 FROM DUAL UNION ALL
SELECT 123, 333 FROM DUAL UNION ALL
SELECT 124, 222 FROM DUAL UNION ALL
SELECT 124, 224 FROM DUAL UNION ALL
SELECT 124, 333 FROM DUAL UNION ALL
SELECT 125, 224 FROM DUAL UNION ALL
SELECT 125, 333 FROM DUAL;

CREATE TYPE INTLIST AS TABLE OF INT;
/

Query:

WITH Activities ( User_IDs, Activity_ID ) AS (
  SELECT CAST( COLLECT( User_ID ) AS INTLIST ),
         Activity_ID
  FROM   data
  GROUP BY Activity_ID
)
SELECT a.Activity_ID,
       b.Activity_ID,
       CARDINALITY( a.User_IDs MULTISET INTERSECT b.User_IDs ) AS "Count"
FROM   Activities a
       INNER JOIN
       Activities b
       ON ( CARDINALITY( a.User_IDs MULTISET INTERSECT b.User_IDs ) > 1
           AND a.Activity_ID < b.Activity_ID );

Output:

ACTIVITY_ID ACTIVITY_ID      Count
----------- ----------- ----------
        222         333          2 
        224         333          2 

Upvotes: 0

user5683823
user5683823

Reputation:

Assuming you have an activity table with activity id's, and you want to count only DISTINCT users who had the same two activities (the same user having both activities twice wouldn't count):

select a1.activity_id, a2.activity_id, count(distinct f.user_id)
from   activity a1 inner join facts    f  on a1.activity_id = f.activity_id
                   inner join activity a2 on a2.activity_id = f.activity_id
where  a1.activity_id < a2.activity_id
group by a1.activity_id, a2.activity_id
having count(distinct f.user_id) >= 2
;

facts is the name of your facts table (the one you show in your question).

EDIT: If the facts table (or view or subquery or whatever) is already "distinct"-ed by user_id, then delete "distinct" from my solution; this will make it more efficient. NOTE: "distinct" appears twice, once in SELECT and again in HAVING.

Upvotes: 1

Related Questions