Voriki
Voriki

Reputation: 1637

Storing large 2D array in DB, performance concerns when joining table

I'm storing 24-hour weekly schedules, meaning each person has a 24x7 2D array (availability[time][day]), totaling 168 elements per person. In searches through users, availability is a filter, which means these elements have to be stored in a table (availabilities).

Part of the schema for availabilities:

+---------+----------------+
| Field   | Type           |
+---------+----------------+
| user_id | int(10)        |
| time    | varchar(4)     |
| mon     | tinyint(1)     |
| tue     | tinyint(1)     |
| wed     | tinyint(1)     |
| thu     | tinyint(1)     |
| fri     | tinyint(1)     |
| sat     | tinyint(1)     |
| sun     | tinyint(1)     |
+---------+----------------+

Sample select (each user would actually have 24 rows for the whole day):

+---------+------+-----+-----+-----+-----+-----+-----+-----+
| user_id | time | mon | tue | wed | thu | fri | sat | sun |
+---------+------+-----+-----+-----+-----+-----+-----+-----+
| 1       | 6am  |   1 |   0 |   1 |   1 |   1 |   0 |   0 |
| 1       | 7am  |   1 |   0 |   1 |   1 |   1 |   0 |   0 |
| 1       | 8am  |   1 |   0 |   1 |   0 |   1 |   0 |   0 |
| 1       | 9am  |   0 |   0 |   0 |   1 |   0 |   0 |   0 |
| 1       | 10am |   0 |   0 |   0 |   1 |   0 |   0 |   1 |
| 1       | 11am |   1 |   0 |   1 |   0 |   1 |   0 |   1 |
| 1       | 12pm |   1 |   0 |   1 |   1 |   1 |   0 |   1 |
+---------+------+-----+-----+-----+-----+-----+-----+-----+

My concern is that this this table will be massive, and joining and parsing it will slow things down immensely. The availability filter will be the last one applied, but the potential set of users returned might still be large.

My questions:

  1. Is there a more effective way to store this information so that the table isn't as large? Would serializing the array and saving it to one field on the users table (e.g. users.availability) help performance? (there'd be more parsing, but the massive join would be skipped)

  2. Is the size of the table really an issue? This is my first large application, so I'm not sure if this table actually is large enough to worry about. (E.g. if 25 users are returned, the availability table will have 4,800 fields [not including user_id])

Upvotes: 0

Views: 137

Answers (1)

FredTheWebGuy
FredTheWebGuy

Reputation: 2586

You only have to start worrying about performance when you are approaching tens of millions of rows. I'm not seeing any issue here, except for a little premature optimization on your part:)

Since you are already starting off on the right foot, it seems, by going the normalized route, performance shouldn't be too much of a concern yet. Serializing the schedules into an array would be too much unnecessary work:

EXAMPLE: What if you wanted to search for all users scheduled at X hour on Y day? If it were stored in an array, every row would have to be parsed and searched individually for the times and dates. You'd be back to square one- and addressing a serious concern about performance.

Put an

EXPLAIN EXTENDED 

before your queries to see what's happening behind the scenes. So long as your joins are searching for the rows by their indexes, your app should fly.

Upvotes: 1

Related Questions