user2696962
user2696962

Reputation: 575

Constructing a DB for best performance

I'm working on "online streaming" project and I need some help in constructing a DB for best performance. Currently I have one table containing all relevant information for the player including file, poster image, post_id etc.

+---------------+-------------+------+-----+---------+----------------+
| Field         | Type        | Null | Key | Default | Extra          |
+---------------+-------------+------+-----+---------+----------------+
| id            | int(11)     | NO   | PRI | NULL    | auto_increment |
| post_id       | int(11)     | YES  |     | NULL    |                |
| file          | mediumtext  | NO   |     | NULL    |                |
| thumbs_img    | mediumtext  | YES  |     | NULL    |                |
| thumbs_size   | mediumtext  | YES  |     | NULL    |                |
| thumbs_points | mediumtext  | YES  |     | NULL    |                |
| poster_img    | mediumtext  | YES  |     | NULL    |                |
| type          | int(11)     | NO   |     | NULL    |                |
| uuid          | varchar(40) | YES  |     | NULL    |                |
| season        | int(11)     | YES  |     | NULL    |                |
| episode       | int(11)     | YES  |     | NULL    |                |
| comment       | text        | YES  |     | NULL    |                |
| playlistName  | text        | YES  |     | NULL    |                |
| time          | varchar(40) | YES  |     | NULL    |                |
| mini_poster   | mediumtext  | YES  |     | NULL    |                |
+---------------+-------------+------+-----+---------+----------------+

With 100k records it takes around 0.5 sec for a query and performance constantly degrading as I have more records.

+----------+------------+----------------------------------------------------------------------+
| Query_ID | Duration   | Query                                                                |
+----------+------------+----------------------------------------------------------------------+
|        1 | 0.04630675 | SELECT * FROM dle_playerFiles where post_id in ('7000') AND type='1' |
+----------+------------+----------------------------------------------------------------------+

explain SELECT * FROM dle_playerFiles where post_id in ('7000') AND type='1';

+----+-------------+-----------------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table           | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+----+-------------+-----------------+------+---------------+------+---------+------+-------+-------------+
|  1 | SIMPLE      | dle_playerFiles | ALL  | NULL          | NULL | NULL    | NULL | 61777 | Using where |
+----+-------------+-----------------+------+---------------+------+---------+------+-------+-------------+

How can I improve DB structure? How big websites like youtube construct their database?

Upvotes: 0

Views: 83

Answers (2)

Monty Wild
Monty Wild

Reputation: 3991

If I read your Duration correctly, it appears to take 0.04630675 (seconds?) to run your query, not 0.5s.

Regardless, proper indexing can decrease the time required to return query results. Based on your query SELECT * FROM dle_playerFiles where post_id in ('7000') AND type='1', an index on post_id and type would be advisable.

Also, if you don't absolutely require all the fields to be returned, use individual column references of the fields you require instead of the *. The fewer fields, the quicker the query will return.

Another way to optimize a query is to ensure that you use the smallest data types possible - especially in primary/foreign key and index fields. Never use a bigint or an int when a mediumint, smallint or better still, a tinyint will do. Never, ever use a text field in a PK or FK unless you have no other choice (this one is a DB design sin that is committed far too often IMO, even by people with enough training and experience to know better) - you're far better off using the smallest exact numeric type possible. All this has positive impacts on storage size too.

Upvotes: 1

Brandon
Brandon

Reputation: 10028

Generally when query time is directly proportional to the number of rows, that suggests a table scan, which means for a query like

SELECT * FROM dle_playerFiles where post_id in ('7000') AND type='1'

The database is executing that literally, as in, iterate over every single row and check if it meets criteria.

The typical solution to this is an index, which is a precomputed list of values for a column (or set of columns) and a list of rows which have said value.

If you create an index on the post_id column on dle_playerFiles, then the index would essentially say

1: <some row pointer>, <some row pointer>, <some row pointer>
2: <some row pointer>, <some row pointer>, <some row pointer>
...
100: <some row pointer>, <some row pointer>, <some row pointer>
...
7000: <some row pointer>, <some row pointer>, <some row pointer>
250000: <some row pointer>, <some row pointer>, <some row pointer>

Therefore, with such an index in place, the above query would simply look at node 7000 of the index and know which rows contain it.

Then the database only needs to read the rows where post_id is 7000 and check if their type is 1.

This will be much quicker because the database never needs to look at every row to handle a query. The costs of an index:

  1. Storage space - this is more data and it has to be stored somewhere
  2. Update time - databases keep indexes in sync with changes to the table automatically, which means that INSERT, UPDATE and DELETE statements will take longer because they need to update the data. For small and efficient indexes, this tradeoff is usually worth it.

For your query, I recommend you create an index on 2 columns. Make them part of the same index, not 2 separate indexes:

create index ix_dle_playerFiles__post_id_type on dle_playerFiles (post_id, type)

Caveats to this working efficiently:

  1. SELECT * is bad here. If you are returning every column, then the database must go to the table to read the columns because the index only contains the columns for filtering. If you really only need one or two of the columns, specify them explicitly in the SELECT clause and add them to your index. Do NOT do this for many columns as it just bloats the index.
  2. Functions and type conversions tend to prevent index usage. Your SQL wraps the integer types post_id and type in quotes so they are interpreted as strings. The database may feel that an index can't be used because it has to convert everything. Remove the quotes for good measure.

Upvotes: 2

Related Questions