Reputation: 12003
I wrote a program to generate tests composed of a combination of questions taken from a large pool of questions. There were a number of criteria for each test and the program saved them to database only if they satisfied these criteria.
My program was written to ensure as even a distribution of questions as possible, i.e., when generating combinations of questions, the algorithm prioritise questions from the pool that have been asked the least number of times in previous iterations.
I created one table, test_questions
to essentially store the test_id
for each test and another, test_questions
to store test_id
s and their corresponding question_id
s using n rows per test (where n is the number of questions in each test).
Now that I have the tests stored in a database, I’d like to check that the overlap of questions between different pairs of test are within certain bounds and I thought I should be able to do this using SQL.
Using a self-join, I was able to use this query to select the questions common to Test 3 and Test 5:
-- Get the number of questions that are common to tests 3 and 5
SELECT count(tq1.question_id) AS Overlap
FROM test_questions AS tq1
JOIN test_questions AS tq2
ON tq1.question_id = tq2.question_id
WHERE tq1.test_id = 5
AND tq2.test_id = 3;
I was able to generate each possible combination of test pairs from the first n (5) tests:
-- Get all combinations of pairs of tests from 1 to 5
SELECT t1.test_id AS Test1, t2.test_id AS Test2
FROM tests AS t1
JOIN tests AS t2
ON t2.test_id > t1.test_id
WHERE t1.test_id <= 5
AND t2.test_id <= 5;
What I’d like to do but so far have failed to do is to combine the above two queries to show each possible pair combination of the first 5 tests – along with the number of questions that are common to both tests.
-- This doesn't work
SELECT t1.test_id AS Test1, t2.test_id AS Test2, count(tq1.question_id) AS Overlap
FROM tests AS t1
JOIN tests AS t2
ON t2.test_id > t1.test_id
JOIN test_questions AS tq1
ON t1.test_id = tq1.test_id
JOIN test_questions AS tq2
ON t2.test_id = tq2.test_id
WHERE t1.test_id <= 11
AND t2.test_id <= 11
GROUP BY t1.test_id, t2.test_id;
I’ve created a simplified version (with randomised data) of the two tables at this SQL Fiddle
Note: I’m using MySQL as my DBMS but the SQL should be compatible with the ANSI standard.
Edit: The program I wrote to generate the tests actually generated more than the number of tests I needed and I only want to compare the first n tests. In the example, I added a <= 5
WHERE condition to ignore the extra tests.
To clarify what I’m looking for as per Thorsten Kettner’s example data:
test 1: a, b and c
test 2: a, b and d
test 3: d, e and f
The results would be:
Test Test Overlap
Test1 Test2 2 (a and b in common)
Test1 Test3 0 (no questions in common)
Test2 Test3 1 (d is common to both)
Upvotes: 0
Views: 747
Reputation: 12003
I modified Gordon's answer and this query provides a listing of test combinations along with their corresponding overlap (questions in common):
SELECT tq1.test_id as test_id1, tq2.test_id as test_id2, count(tq1.question_id) AS Overlap
FROM test_questions tq1
JOIN test_questions tq2
ON tq1.question_id = tq2.question_id
AND tq1.test_id < tq2.test_id
WHERE tq1.test_id <= 5
AND tq2.test_id <= 5
GROUP BY tq1.test_id, tq2.test_id;
Upvotes: 2
Reputation: 95072
select test_combinations.t1_test_id, test_combinations.t2_test_id, count(q2.question_id) from ( select t1.test_id as t1_test_id, t2.test_id as t2_test_id from (select test_id from tests where test_id t1.test_id ) test_combinations inner join test_questions q1 on q1.test_id = test_combinations.t1_test_id left join test_questions q2 on q2.test_id = test_combinations.t2_test_id and q2.question_id = q1.question_id group by test_combinations.t1_test_id, test_combinations.t2_test_id order by test_combinations.t1_test_id, test_combinations.t2_test_id;
I've added a test with no overlapping questions to your fiddle and removed the restriction to test_id <= 5, so you see pairs of tests with zero overlapping questions: http://sqlfiddle.com/#!2/e83aa/1
Upvotes: 1
Reputation: 1270713
You just need a group by
to your first query (basically). I also added another condition, so the test ids are produced in order:
SELECT tq1.test_id as test_id1, tq2.test_id as test_id2, count(tq1.question_id) AS Overlap
FROM test_questions tq1 LEFT JOIN
test_questions tq2
ON tq1.question_id = tq2.question_id and
tq1.test_id < tq2.test_id
GROUP BY tq1.test_id, tq2.test_id;
This is standard SQL.
If you want to get all pairs of tests, even those that have no questions in common, here is another approach:
SELECT t1.test_id as test_id1, t2.test_id as test_id2, count(tq2.question_id) AS Overlap
FROM tests t1 CROSS JOIN
tests t2 LEFT JOIN
test_questions tq1
on t1.test_id = tq1.test_id LEFT JOIN
test_questions tq2
ON t2.test_id = tq2.test_id and tq1.question_id = tq2.question_id
GROUP BY t1.test_id, t2.test_id;
This assumes that you have a table with one row per test. If not, replace tests
with (select distinct test from test_questions)
.
Upvotes: 4