Nick
Nick

Reputation: 1031

Set Overlap Percentage in SQL

Assume the following table structure in a SQL database:

Att1   |   Att2   |   Att3   |  Cnt

I would like to understand the overlap of Att3 for all Att2 combinations for a given Att1. For example if I had the following table:

123   |   456    |   abc    |  1
123   |   456    |   efg    |  1
123   |   456    |   hij    |  1
123   |   456    |   klm    |  1
123   |   456    |   nop    |  1
123   |   789    |   efg    |  1
123   |   789    |   abc    |  1
123   |   789    |   xyz    |  1
123   |   789    |   nop    |  1
345   |   456    |   abc    |  1
345   |   456    |   efg    |  1
345   |   789    |   abc    |  1
345   |   999    |   efg    |  1

I would generate the following output:

123   |   456    |   456    |  1.0
123   |   456    |   789    |  .6
123   |   789    |   456    |  .75
123   |   789    |   789    |  1.0
345   |   456    |   456    |  1.0
345   |   456    |   789    |  .5
345   |   456    |   999    |  .5
345   |   789    |   456    |  1.0
345   |   789    |   789    |  1.0 
345   |   789    |   999    |  0
345   |   999    |   456    |  1.0
345   |   999    |   789    |  0  
345   |   999    |   999    |  1.0

I understand that this may not be best accomplished using SQL, so I'm happy to hear other alternatives, but SQL is where I currently have the data available.

Upvotes: 0

Views: 345

Answers (1)

PinnyM
PinnyM

Reputation: 35531

You can do this using a few simple tricks with aggregates:

SELECT t1.att1, t1.att2, t2.att2 as att2_other, 
       SUM(CASE WHEN t2.att3 = t1.att3 THEN 1.0 ELSE 0 END)/COUNT(DISTINCT t1.att3) as Cnt
FROM table_name t1
JOIN table_name t2
  ON t1.att1 = t2.att1
GROUP BY t1.att1, t1.att2, t2.att2

Working sqlfiddle here

Upvotes: 6

Related Questions