Reputation: 1031
Assume the following table structure in a SQL database:
Att1 | Att2 | Att3 | Cnt
I would like to understand the overlap of Att3 for all Att2 combinations for a given Att1. For example if I had the following table:
123 | 456 | abc | 1
123 | 456 | efg | 1
123 | 456 | hij | 1
123 | 456 | klm | 1
123 | 456 | nop | 1
123 | 789 | efg | 1
123 | 789 | abc | 1
123 | 789 | xyz | 1
123 | 789 | nop | 1
345 | 456 | abc | 1
345 | 456 | efg | 1
345 | 789 | abc | 1
345 | 999 | efg | 1
I would generate the following output:
123 | 456 | 456 | 1.0
123 | 456 | 789 | .6
123 | 789 | 456 | .75
123 | 789 | 789 | 1.0
345 | 456 | 456 | 1.0
345 | 456 | 789 | .5
345 | 456 | 999 | .5
345 | 789 | 456 | 1.0
345 | 789 | 789 | 1.0
345 | 789 | 999 | 0
345 | 999 | 456 | 1.0
345 | 999 | 789 | 0
345 | 999 | 999 | 1.0
I understand that this may not be best accomplished using SQL, so I'm happy to hear other alternatives, but SQL is where I currently have the data available.
Upvotes: 0
Views: 345
Reputation: 35531
You can do this using a few simple tricks with aggregates:
SELECT t1.att1, t1.att2, t2.att2 as att2_other,
SUM(CASE WHEN t2.att3 = t1.att3 THEN 1.0 ELSE 0 END)/COUNT(DISTINCT t1.att3) as Cnt
FROM table_name t1
JOIN table_name t2
ON t1.att1 = t2.att1
GROUP BY t1.att1, t1.att2, t2.att2
Working sqlfiddle here
Upvotes: 6