A l w a y s S u n n y
A l w a y s S u n n y

Reputation: 38502

PostgreSQL find common combination of pairs on same column

WHAT I NEED

Find pairs of platforms that platform2 offers more than 50% of the series offered by platform1. Return pairs of platforms (renamed platform1, platform2) and proportion of series they have in common (numcommon)

WHAT I TRIED: Here I've multiple series are broadcast on different platform. I just added couple of them. For example:

Full Dataset:

|seriesid|platform       |
|--------|---------------|
|4       |Netflix        |
|6       |Netflix        |
|8       |Netflix        |
|13      |Netflix        |
|13      |TNT            |
|13      |Free Services  |
|16      |Hulu           |
|16      |Netflix        |
|16      |Free Services  |
|20      |Netflix        |

EXPECTED OUTPUT:

By using the query of @Julius

Current Query

select 
  sp1.platform as platform1, sp2.platform as platform2, count(*) as numcommon 
from 
  seriesonplatform AS sp1
  JOIN seriesonplatform AS sp2 ON (sp1.seriesid = sp2.seriesid) AND (sp1.platform <>  sp2.platform)
group by
  sp1.platform,
  sp2.platform
order by sp1.platform, sp2.platform,numcommon

Current Output

    |platform1|platform2      |numcommon|
|---------|---------------|---------|
|A&E      |Free Services  |1        |
|A&E      |Prime Video    |1        |
|AcornTV  |CBS All Access |1        |
|AcornTV  |Free Services  |1        |
|AcornTV  |Hoopla         |2        |
|AcornTV  |Hulu           |1        |
|AcornTV  |Netflix        |6        |
|AcornTV  |Prime Video    |2        |
|Adult Swim|Free Services  |1        |
|Adult Swim|HBO MAX        |1        |
|BET      |BET+           |1        |
|BET      |Free Services  |1        |
|BET+     |BET            |1        |
|BET+     |Free Services  |1        |
|Bravo    |Free Services  |1        |
|Bravo    |Netflix        |2        |
|BritBox  |Hoopla         |2        |
|CBS All Access|AcornTV        |1        |
|CBS All Access|Free Services  |6        |
|CBS All Access|Hulu           |1        |
|CBS All Access|Netflix        |7        |
|CBS All Access|fuboTV         |2        |
|Cartoon Network|Free Services  |2        |
|Cartoon Network|HBO MAX        |2        |
|Cartoon Network|Hulu           |1        |
|Cartoon Network|Netflix        |1        |
|Comedy Central|Hoopla         |1        |
|Comedy Central|Netflix        |1        |
|Crunchyroll|Free Services  |3        |

Expected Output

    platform1    |   platform2   | numcommon 
-----------------+---------------+-----------
 AcornTV         | Netflix       |      0.60
 Adult Swim      | Free Services |      1.00
 Adult Swim      | HBO MAX       |      1.00
 BET             | BET+          |      1.00
 BET             | Free Services |      1.00
 BET+            | BET           |      1.00
 BET+            | Free Services |      1.00
 Bravo           | Netflix       |      1.00
 Cartoon Network | Free Services |      0.67
 Cartoon Network | HBO MAX       |      0.67
 CBS All Access  | Netflix       |      0.58
 Crunchyroll     | Free Services |      1.00
 Disney          | Disney+       |      1.00
 Disney          | Free Services |      1.00
 fuboTV          | Netflix       |      0.60
 Funimation      | Free Services |      0.67
 HBO             | HBO MAX       |      1.00
 History         | Hulu          |      0.67
 History         | Netflix       |      0.67
 IFC             | fuboTV        |      0.67
 IFC             | Netflix       |      0.67
 IndieFlix       | Free Services |      1.00
 IndieFlix       | Hoopla        |      1.00
 Lifetime        | Netflix       |      0.67
 Showtime        | fuboTV        |      1.00
 Showtime        | Netflix       |      0.67
 Sundance        | Free Services |      1.00
 Sundance        | Prime Video   |      1.00

Upvotes: 0

Views: 107

Answers (1)

Julius Tuskenis
Julius Tuskenis

Reputation: 1620

I guess you need a join to find all platform pairs:

select 
  sp1.platform as platform1, sp2.platform as platform2, count(*) as numcommon 
from 
  seriesonplatform AS sp1
  JOIN seriesonplatform AS sp2 ON (sp1.seriesid = sp2.seriesid) AND (sp1.platform <>  sp2.platform)
where 
  sp1.platform in ('BET','BET+')
  AND sp2.platform in ('BET','BET+')
group by
  sp1.platform,
  sp2.platform

EDIT 2

Using the query above you can find the common series number. All is left is finding percentage:

SELECT 
  platform1.platform, common.platform2, numcommon * 1.0 / platform1.seriescount AS show_on_platform2
FROM (
  SELECT 
    platform, count(*) AS seriescount
  FROM 
    seriesonplatform
  GROUP BY
    platform 
) platform1
LEFT JOIN (
  SELECT 
    sp1.platform as platform1, sp2.platform as platform2, count(*) as numcommon 
  FROM 
    seriesonplatform AS sp1
    JOIN seriesonplatform AS sp2 ON (sp1.seriesid = sp2.seriesid) AND (sp1.platform <>  sp2.platform)
  GROUP BY
    sp1.platform,
    sp2.platform
) AS common ON platform1.platform = common.platform1
WHERE 
  numcommon * 1.0 / platform1.seriescount > 0.5
ORDER BY
  platform1.platform, platform2

Upvotes: 1

Related Questions