Reputation: 21

Make the sum of a column a specific value query sql

I have this code to get questions of a database. I want the database to generate questions randonmly but the sum of the values of the question have to be 20. The database has to choose 10 questions radonmly and the value sum is 20.

I want 10 idquestion from the table questions where the idsubject is between (...) ordered by rand

    select idquestion, value
    from questions
    where idsubject in (1,3,5,9)
    order by rand()
    limit 10

I expect to appear a lists of 10 random questions and the sum of value equals 20 like this:

idquestion  |  value
15          |   3
11          |   4
14          |   2
18          |   1
21          |   1
45          |   1
25          |   3
65          |   1
10          |   2
12          |   2

As you can see, the sum of the column value is 20

EDIT

I have this query that sums the value column of the questions chosen randomnly, I want that the number of the sum is always 20

  SELECT SUM(value) FROM
  ((SELECT idquestion,value FROM quations WHERE idsubject IN (1,2,3)
  ORDER BY RAND()
  DESC LIMIT 10)) AS subt

Upvotes: 2

Answers (4)

LSerni

Reputation: 57398

The simple JOIN method unfortunately will very likely never work in practice.

Each subsequent JOIN will increase query time exponentially.

Let us try with 80 rows in a table having just id and value as integers. This is more or less the ideal situation.

select q1.id, q2.id, q3.id, q4.id 
FROM questions AS q1 
JOIN questions AS q2 ON (q1.id < q2.id) 
JOIN questions AS q3 ON (q2.id < q3.id) 
JOIN questions AS q4 ON (q3.id < q4.id) 
WHERE q1.value + q2.value + q3.value + q4.value = 8 ORDER BY rand() LIMIT 1;
+------+------+------+------+
| id   | id   | id   | id   |
+------+------+------+------+
|   22 |   61 |   76 |   78 |
+------+------+------+------+
1 row in set (0.74 sec)

0.74 seconds is pretty good. Let us step up to five questions:

SELECT q1.id, q2.id, q3.id, q4.id, q5.id 
FROM questions AS q1 
JOIN questions AS q2 ON (q1.id < q2.id) 
JOIN questions AS q3 ON (q2.id < q3.id) 
JOIN questions AS q4 ON (q3.id < q4.id) 
JOIN questions AS q5 ON (q4.id < q5.id) 
WHERE q1.value + q2.value + q3.value + q4.value + q5.value = 10 ORDER BY rand() LIMIT 1;

1 row in set (17.71 sec)

Time has increased by about a factor of 23. If this keeps up, even if it's only a factor 10 instead of 23, 6 questions will take around three minutes, seven questions half an hour, eight questions five hours, and ten questions around three weeks.

Even trying to trim down a bit the number of rows by adding constraints will not avail much, because each selection will now cost a little more.

SELECT q1.id, q2.id, q3.id, q4.id, q5.id 
FROM questions AS q1 
JOIN questions AS q2 ON (q1.id < q2.id AND q1.value + q2.value < 8)
JOIN questions AS q3 ON (q2.id < q3.id AND q1.value + q2.value + q3.value < 9) 
JOIN questions AS q4 ON (q3.id < q4.id AND q1.value + q2.value + q3.value + q4.value < 10) 
JOIN questions AS q5 ON (q4.id < q5.id) 
WHERE q1.value + q2.value + q3.value + q4.value + q5.value = 10 ORDER BY rand() LIMIT 1;

1 row in set (17.45 sec)

A method that will yield better results requires getting rid of the full table scan, which means ending the query as soon as there is one full row of data - but then you cannot randomize the data! To have random data, we employ a ugly hack which involves selecting the rows in deterministic order from a randomized (shuffled) table:

This has also the advantage of immediately extracting the columns and rows we strictly need from a possibly much larger questions table:

CREATE TABLE q123 AS SELECT id, value FROM questions WHERE idsubject IN(1,3,5,9) ORDER BY RAND();

(You cannot use a TEMPORARY TABLE here, maybe you can use a memory table, haven't tried).

On my system I use a simplified version with no subject:

mysql> CREATE TABLE q123 AS SELECT * FROM questions ORDER BY RAND();

mysql> SELECT q1.id, q2.id, q3.id, q4.id, q5.id, q6.id, q7.id, q8.id
FROM q123 AS q1
JOIN q123 AS q2 ON (q1.id < q2.id)
JOIN q123 AS q3 ON (q2.id < q3.id)
JOIN q123 AS q4 ON (q3.id < q4.id)
JOIN q123 AS q5 ON (q4.id < q5.id)
JOIN q123 AS q6 ON (q5.id < q6.id)
JOIN q123 AS q7 ON (q6.id < q7.id)
JOIN q123 AS q8 ON (q7.id < q8.id)
WHERE q1.value + q2.value + q3.value + q4.value + q5.value + q6.value + q7.value + q8.value = 16 LIMIT 1;
    -> FROM q123 AS q1
    -> JOIN q123 AS q2 ON (q1.id < q2.id)
    -> JOIN q123 AS q3 ON (q2.id < q3.id)
    -> JOIN q123 AS q4 ON (q3.id < q4.id)
    -> JOIN q123 AS q5 ON (q4.id < q5.id)
    -> JOIN q123 AS q6 ON (q5.id < q6.id)
    -> JOIN q123 AS q7 ON (q6.id < q7.id)
    -> JOIN q123 AS q8 ON (q7.id < q8.id)
    -> WHERE q1.value + q2.value + q3.value + q4.value + q5.value + q6.value + q7.value + q8.value = 16 LIMIT 1;
+------+------+------+------+------+------+------+------+
| id   | id   | id   | id   | id   | id   | id   | id   |
+------+------+------+------+------+------+------+------+
|    3 |    9 |   18 |   32 |   35 |   43 |   46 |   58 |
+------+------+------+------+------+------+------+------+
1 row in set (0.15 sec)

DROP TABLE q123;
CREATE TABLE q123 AS SELECT * FROM questions ORDER BY RAND();
...
+------+------+------+------+------+------+------+------+
| id   | id   | id   | id   | id   | id   | id   | id   |
+------+------+------+------+------+------+------+------+
|   13 |   22 |   23 |   32 |   37 |   39 |   45 |   63 |
+------+------+------+------+------+------+------+------+
1 row in set (0.18 sec)

Now, 8 questions take 0.18 seconds. 10 questions take 0.38 (which probably means that this method scales better, but not so much better).

Semi-SQL

The approach involving "q3.id > q2.id AND q3.value + q2.value + q1.value < X" is due to the following reasoning:

If I perform a full JOIN, questions X and Y will occur twice, one in the order (X, Y) and the other in the order (Y, X). To avoid this, I request that X and Y be in a known order - say, X.id < Y.id. Also, if I get questions 1, 2, and 3, the remaining questions at least have to score one point each. So I will score at least seven points. If I already scored more than 13 points between questions 1, 2 and 3, the remaining questions will surely blow beyond 20 and be no good. So 1, 2 and 3 must score less than 13 points.

This approach can be run in Java outside the database server. A first SELECT might reap ONE among all trios of questions with total sum less than 13 (or maybe less than 10, reasoning that you don't want three large score questions and seven small fry questions. If 10 qs get you 20 points, each question should be 2 on average - say 1 to 3 - so three questions get you at most 9 points).

Then you run subsequent selects one at a time, requesting id more than the highest id you've got so far, and value again in a suitable range based on the points accumulated so far.

Question uniformity goes to hell, and you'll need a recursive function because you might need to repeat the last queries, but total run time should grow linearithmically and no longer exponentially:

# VERY pseudo code, mixing SQL and something javascriptish

function getQuestionId(int maxidSoFar, int remainingQuestions, int remainingPoints, solution = [ ]) {
    minVal = max(1, floor(0.5 * remainingPoints / remainingQuestions))
    maxVal = min(1, floor(2.0 * remainingPoints / remainingQuestions))
    SELECT id, value FROM questions WHERE ... AND id > maxidSoFar AND value BETWEEN minVal AND maxVal ORDER BY RAND() LIMIT 3
    // Now for all (say) three possibilities
        candidate = getQuestionId(id, remainingQuestions - 1, remainingPoints - value)
        if (0 !== candidate.pop()) {
            return solution.push(id)
        }
    // failed
    return [ 0 ]

Divide and conquer

If acceptable, you can try a divide-and-conquer approach and not select 10 questions with total 20 points, but two sets of 5 questions with 10 points for each set. Of course you need two disjoint sets, which creates a further difficulty. If enough questions are available that the overall cardinality is not reduced below viability, you can extract five even-numbered and five odd-numbered questions.

If there are few viable questions, though, the query should weight much less; so, if the query is too large, this means that many combinations are viable (sum to 20), and chances of eliminating most of them by partitioning the table are slim.

Upvotes: 1

na-98

Reputation: 889

Here is another approach. May not be the best optimal solution but it does it in SQL only using MySQL Routine. You obviously want to clean up the DEFINER if you want to use in production.

First create a MySQL routine in the respective DB. In my case it's test yours should be something different.

USE `test`;
DROP procedure IF EXISTS `findTenRandQuestions`;

DELIMITER $$
USE `test`$$
CREATE DEFINER=`root`@`localhost` PROCEDURE `findTenRandQuestions`()
BEGIN

DECLARE continueFinding int;

SET continueFinding = 1;

DROP TEMPORARY TABLE IF EXISTS test.tenRandQuestion;
CREATE TEMPORARY TABLE test.tenRandQuestion (idquestions int, value int);


while continueFinding = 1 DO

DELETE FROM test.tenRandQuestion;

INSERT INTO test.tenRandQuestion
select idquestions, value
from questions
where idsubject in (1,3,5,9)
order by rand()
limit 10;

IF ( (SELECT SUM(value) FROM test.tenRandQuestion) = 20) THEN
SET continueFinding = 0;
END IF;


END WHILE;

SELECT * FROM test.tenRandQuestion;    

END$$

DELIMITER ;

Now you can call this routine like this call test.findTenRandQuestions(); from your Java code. You see below result.

The downside here is the routine assumes there will always be at least 10 questions whose value sum will be 20. If that isn't the case the routine could run forever so you want to add some logic to break out after certain iterations. You can simply add a counter and break after 10 iterations as an example.

Upvotes: 1

Serg

Reputation: 22811

10 self joins, assuming all values are positive.

 SELECT Q1.id, Q2.id, .., Q10.id
 FROM question Q1
 JOIN question Q2
   ON Q1.id < Q2.id and Q1.idsubject IN (1,3,5,9) and Q2.idsubject IN (1,3,5,9)
      and Q1.value + Q2.value < 13
 JOIN question Q3
   ON Q2.id < Q3.id and Q3.idsubject IN (1,3,5,9)
      and Q1.value + Q2.value + Q3.value < 14
 ..
 JOIN question Q10
   ON Q9.id < Q10.id and Q10.idsubject IN (1,3,5,9)
      and Q1.value + Q2.value + Q3.value .. + Q10.value = 20 
 ORDER BY rand()
 LIMIT 1

Upvotes: 1

Juan Carlos Oropeza

Reputation: 48197

First you need do all 10 question combinations

 SELECT Q1.id, Q2.id, ......, Q10.id
 FROM question Q1
 JOIN question Q2
   ON Q1.id < Q2.id
 JOIN question Q3
   ON Q2.id < Q3.id
 ....
 JOIN question Q10
   ON Q9.id < Q10.id

Then find out which one add 20 points

 WHERE Q1.value + Q2.value + .... Q10.value = 10
   AND Q1.idsubject IN (1,3,5,9)
   AND Q2.idsubject IN (1,3,5,9)
   ...
   AND Q10.idsubject IN (1,3,5,9)

Then select one randomly

 ORDER BY rand()
 LIMIT 1

For example one exam with 3 question and pool of 5 questions you have the following combination

 Q1 < Q2 < Q3

 1,2,3
 1,2,4
 1,2,5
 2,3,4
 2,3,5
 3,4,5

I calculate all the combination and check which one add to 20.

Upvotes: 1

Make the sum of a column a specific value query sql

Answers (4)

Semi-SQL

Divide and conquer

Related Questions