Reputation: 1199
I have a table which roughly looks like so:
id | unique | group
----------------------------------
1 | aaa | group1
2 | bbb | group1
3 | ggg | group1
4 | iii | group2
5 | ooo | group2
6 | eee | group3
7 | ttt | group4
8 | qqq | group4
9 | nnn | group4
I would like to choose a somewhat random subset of roughly size N
from this table. The N
entries should belong to a subset of group
s, no matter which group
, no matter the group size and they should be randomly sorted on group-level. For example, for N==3
, I want either of
[ooo, iii, eee]
(i.e. entries from group2, then group3) or
[eee, iii, ooo]
(i.e. entries from group3, then group2) or
[ttt, nnn, qqq]
(i.e. entries from group4) or
[qqq, nnn, ttt]
(i.e. entries from group4) or
[eee, ggg, bbb, aaa]
(i.e. entries from group3, then group1) or
...
I'm fine with not having a group "completely" selected.
I do not do this very often and I prefer multiple queries over one complex. N
will be around 100
or so, the table has <10k entries and the programm runs once per day. N
does not need to be exacly matched. I am doing this in Python/Mysqldb but I'm rather interested in the general algorithm/approach.
Additionally, once the N
entries are processed, I would like to ignore them on the next run, either by simply removing them or by setting some additonal attribute. In the latter case, the queries would need to be adapted.
EDIT: Changed for clarification.
Upvotes: 0
Views: 59
Reputation: 41
I don't know, if this is the best possible solution, but at least it is a solution. Unfortunately, MySQL does not allow LIMIT in subqueries, so I used variables instead.
SELECT group FROM mytable GROUP BY group HAVING count(unique) = 3 ORDER BY rand() LIMIT 1 INTO @randomgroup;
SELECT unique FROM mytable WHERE group = @randomgroup ORDER BY unique;
Upvotes: 0