Reputation: 1763
This is my first scalability-related question.
In order to simplify the problem, I will use a bingo app idea:
We have a bingo application. Each users has a ticket with 15 random numbers out of 90. Each week the bingo is held to find the winner. Numbers are drawn live until there is a winner. Ex:
Which way is better / faster for representing data in table and for searching that table? Table will have 10+ million rows
Table Tickets
id user_id week ticket created
====================================================================================================
1 100022312 1 1,3,5,7,9,14,15,77,78,79,80,81,82,83,84 <timestamp>
2 102232123 1 2,5,9,22,33,44,55,66,77,,78,79,80,88,89 <timestamp>
3 201141028 1 7,8,9,11,22,33,34,35,37,39,51,55,58,63,66 <timestamp>
...
9.000.000 126387125 1 8,18,28,38,48,58,68,78,79,80,81,82,83,84,85 <timestamp>
10.000.000 126387126 1 1,4,14,24,34,45,56,66,67,68,79,80,81,82,83 <timestamp>
Php
$drawn_numbers = '1,2,3,4,5,6,7,89,10,11,12,13,14,15,16,17,18,19,20,21,22, ...';
$result = query("SELECT * FROM Tickets WHERE sksf('$drawn_numbers')");
where sksf
would be some kind of substring function / regex / LIKE done in MySql.
Table Tickets
id user_id week n1 n2 n3 ... n15 created
=============================================================================
1 100022312 1 11 32 52 ... 76 <timestamp>
2 102232123 1 22 52 55 78 <timestamp>
3 201141028 1 77 82 83 ... 89 <timestamp>
...
9.000.000 126387125 1 81 55 32 ... 10 <timestamp>
10.000.000 126387126 1 12 42 13 ... 77 <timestamp>
Php
$drawn_numbers = '1,2,3,4,5,6,7,89,10,11,12,13,14,15,16,17,18,19,20,21,22, ...';
$result = query("SELECT * FROM Tickets WHERE contidion1 AND condition2 AND ...");
Unfortunately, here I still don't have any idea about the conditions.
I select all tickets, iterate though them by checking if the drawn numbers contain any ticket.
$drawn_numbers = '1,2,3,4,5,6,7,89,10,11,12,13,14,15,16,17,18,19,20,21,22, ...';
$all_tickets = query("SELECT * FROM Tickets");
foreach ($all_tickets as $ticket) {
if $drawn_numbers.contains($ticket['ticket'])
return $ticket['id'];
}
Will having sorted numbers help in anyway? (Those 15 numbers and the drawn numbers)
What happens when week 2 arrives? Should I use the same table, adding a condition WHERE week=2
, or is it better to have only 1 week per table?
In the original game, a ticket has 15 numbers in 3 rows, each row having 5 numbers. After each number is drawn live, they are able to also calculate the tickets which have one row found in the drawn numbers (also they know the tickets with 2 rows). A ticket having 3 rows found in the drawn numbers would mean winning ticket.
This information makes me think that the representation looks something like:
Table Tickets
id user_id week row1 row2 row3 created
===============================================================================================================
1 100022312 1 1, 3, 5, 7, 9 14,15,77,78,79, 80,81,82,83,84 <timestamp>
2 102232123 1 2, 5, 9,22,33, 44,55,66,77,78, 79,80,87,88,89 <timestamp>
3 201141028 1 7, 8, 9,11,22, 33,34,35,37,39, 51,55,58,63,66 <timestamp>
...
9.000.000 126387125 1 8,18,28,38,48, 58,68,78,79,80, 81,82,83,84,85 <timestamp>
10.000.000 126387126 1 1, 4,14,24,34, 45,56,66,67,68, 79,80,81,82,83 <timestamp>
__ 13 __ 33 40 __ __ 70 __
2 __ 22 37 __ 52 62 __ 81
__ 19 23 __ 44 __ 63 __ 89
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 19, 23, 44, 63, 89
-> there is a winner, no more numbers are drawn.
-> Our ticket did not win jackpot, but we won one row [19, 23, 44, 63, 89] (free ticket)
-> One can win also 2 rows.
Upvotes: 3
Views: 1452
Reputation: 142278
I would use two BIGINT
columns, and have each bit represent one of the bingo values (1..90). It will take a little bit of manipulation to construct a set of balls, but the searching will be easy, the storage will be much more compact, etc.
Let's have 2 columns, one with numbers 1..60, the other with 61..90. (The choice is somewhat arbitrary, but makes it easy to visualize.) Now we can do it with one BIGINT
and one INT
.
IF($value <= 60, 1 << $value, 0) -- the bit for the BIGINT
IF($value > 60, 1 << ($value - 60), 0) -- the bit for the INT
Now or'ing together bits gives you a pair of numbers to represent a winning ticket. Use the |
Operator for this task.
Each player's current balls will also be or'd -- a new bit being turned on after each play.
Then the test becomes something like this:
-- The balls owned by a user:
user_60 BIGINT,
user_30 INT
-- winning balls (12 rows, a total of 5 bits on for each row)
win_60 BIGINT,
win_30 INT
-- Note: FREE SPACE should be pre-populated in both structures
BIT_COUNT(user_60 & win_60) +
BIT_COUNT(user_30 & win_30) = 5 -- he's a winner!
I have left out an important step -- the arrangement of the numbers on each user's board. That will takes some up-front work, especially since 15 of the numbers can only go in the first column of the card. Etc.
Your takeaway from my Answer is to work with bits not other structures.
Another thought is to have 5 SMALLINT UNSIGNED
, one for each column on the cards.
(For the future, in MySQL 8.0, BINARY(16)
would allow a single column to represent a set of 90 values, thereby making the code cleaner.)
re Idea 4
Build 3 bit patterns - one per "row" with 5 numbers. Compare them to the issued tickets: Foreach ticket, compare the 3 patterns; count how many match.
Upvotes: 3
Reputation: 48357
Neither.
One approach would be to use a bitmap of the ticket numbers. You'd need 2 BIGINTs to store the values. If the AND of the ticket numbers and chosen numbers returns the same value as the ticket numbers, then you have a match. But you can't use an index for that.
If it's possible to win by matching only some of the numbers, then both the approaches you cited will be very innefficient - you need to normalize your data:
create table ticket (
id integer auto_increment,
user_id /* appropriate type for FK to your data about users */,
week_number integer,
PRIMARY KEY (id),
);
create table numbers (
number integer not null,
ticket_id integer not null,
FOREIGN KEY ticket_id references ticket(id)
);
This looks like a carefully designed scenario to test out scalability approaches - are you asking us to do your homework here?
Upvotes: 1