Reputation: 865
I've come across some similar questions, but I still don't quite understand.
I have a MySQL database with a table in which I store data of people and unique codes. The base unit for me is the e-mail address. I want to select random records, but limit them so a specific email column can be selected only once. This is my table structure (I'm leaving some columns, that are not relevant to this question).
+-----+-------------------+---------+----------+----------+
| ID | email | name | lastname | code |
+-----+-------------------+---------+----------+----------+
| 1 | [email protected] | Simon | Hardy | 123ABC |
| 2 | [email protected] | John | Doe | EEEEEE |
| 3 | [email protected] | John | Doe | AEAEAE |
| 4 | [email protected] | Bill | Liebe | 5D78AC |
| 5 | [email protected] | Ellen | Petete | 99AQE5 |
| 6 | [email protected] | John | Doe | 000CVV |
| 7 | [email protected] | Peter | Lorem | 54ACSS |
| 8 | [email protected] | Emma | Stone | 98WW7Q |
+-----+-------------------+---------+----------+----------+
If I limit my selection to 3 rows and somehow the row with email = [email protected] got selected, I need the other two rows with this email to be ignored/skipped. This is my query now:
SELECT * FROM people ORDER BY RAND() LIMIT 3
PS: I know "ORDER BY RAND()" is slow, I just didn't focuse on that part yet.
I was thinking about GROUP BY, but as far as I understand that way I would only get that one column, I need to fetch all of them.
Is there a straightforward solution for this in MySQL?
Upvotes: 1
Views: 71
Reputation: 1
EDIT**
SELECT * FROM Test WHERE id IN (SELECT MIN(id) FROM Test GROUP BY email) LIMIT 3;
this should do.
Upvotes: 0
Reputation: 1269463
In MySQL 8+, you can use:
SELECT p.*
FROM people p
ORDER BY ROW_NUMBER() OVER (PARTITION BY email ORDER BY RAND())
LIMIT 3;
If you want to solve both the performance problem and the duplication problem at the same time . . . that is challenging. My recommendation is to select a smaller number of rows and just "hope" that there are enough different emails.
For instance, for 3 emails you might want to get about 100 rows with something like this:
select p.*,
(@rn := if(@e = email, @rn + 1,
if(@e := email, 1, 1)
)
) as rn
from (select p.*
from people p cross join
(select count(*) as cnt from people) pp -- can use primary key index
where rand() < (100 / cnt) -- get about 100 rows
order by email, rand() -- only on about 100 rows
) p cross join
(select @e := '', @rn := 0) params
having rn = 1
limit 3;
Upvotes: 2