Reputation: 575
I'm working on "online streaming" project and I need some help in constructing a DB for best performance. Currently I have one table containing all relevant information for the player including file, poster image, post_id etc.
+---------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| post_id | int(11) | YES | | NULL | |
| file | mediumtext | NO | | NULL | |
| thumbs_img | mediumtext | YES | | NULL | |
| thumbs_size | mediumtext | YES | | NULL | |
| thumbs_points | mediumtext | YES | | NULL | |
| poster_img | mediumtext | YES | | NULL | |
| type | int(11) | NO | | NULL | |
| uuid | varchar(40) | YES | | NULL | |
| season | int(11) | YES | | NULL | |
| episode | int(11) | YES | | NULL | |
| comment | text | YES | | NULL | |
| playlistName | text | YES | | NULL | |
| time | varchar(40) | YES | | NULL | |
| mini_poster | mediumtext | YES | | NULL | |
+---------------+-------------+------+-----+---------+----------------+
With 100k records it takes around 0.5 sec for a query and performance constantly degrading as I have more records.
+----------+------------+----------------------------------------------------------------------+
| Query_ID | Duration | Query |
+----------+------------+----------------------------------------------------------------------+
| 1 | 0.04630675 | SELECT * FROM dle_playerFiles where post_id in ('7000') AND type='1' |
+----------+------------+----------------------------------------------------------------------+
explain SELECT * FROM dle_playerFiles where post_id in ('7000') AND type='1';
+----+-------------+-----------------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | dle_playerFiles | ALL | NULL | NULL | NULL | NULL | 61777 | Using where |
+----+-------------+-----------------+------+---------------+------+---------+------+-------+-------------+
How can I improve DB structure? How big websites like youtube construct their database?
Upvotes: 0
Views: 83
Reputation: 3991
If I read your Duration correctly, it appears to take 0.04630675 (seconds?) to run your query, not 0.5s.
Regardless, proper indexing can decrease the time required to return query results. Based on your query SELECT * FROM dle_playerFiles where post_id in ('7000') AND type='1'
, an index on post_id
and type
would be advisable.
Also, if you don't absolutely require all the fields to be returned, use individual column references of the fields you require instead of the *
. The fewer fields, the quicker the query will return.
Another way to optimize a query is to ensure that you use the smallest data types possible - especially in primary/foreign key and index fields. Never use a bigint
or an int
when a mediumint
, smallint
or better still, a tinyint
will do. Never, ever use a text field in a PK or FK unless you have no other choice (this one is a DB design sin that is committed far too often IMO, even by people with enough training and experience to know better) - you're far better off using the smallest exact numeric type possible. All this has positive impacts on storage size too.
Upvotes: 1
Reputation: 10028
Generally when query time is directly proportional to the number of rows, that suggests a table scan, which means for a query like
SELECT * FROM dle_playerFiles where post_id in ('7000') AND type='1'
The database is executing that literally, as in, iterate over every single row and check if it meets criteria.
The typical solution to this is an index, which is a precomputed list of values for a column (or set of columns) and a list of rows which have said value.
If you create an index on the post_id column on dle_playerFiles, then the index would essentially say
1: <some row pointer>, <some row pointer>, <some row pointer>
2: <some row pointer>, <some row pointer>, <some row pointer>
...
100: <some row pointer>, <some row pointer>, <some row pointer>
...
7000: <some row pointer>, <some row pointer>, <some row pointer>
250000: <some row pointer>, <some row pointer>, <some row pointer>
Therefore, with such an index in place, the above query would simply look at node 7000 of the index and know which rows contain it.
Then the database only needs to read the rows where post_id is 7000 and check if their type is 1.
This will be much quicker because the database never needs to look at every row to handle a query. The costs of an index:
For your query, I recommend you create an index on 2 columns. Make them part of the same index, not 2 separate indexes:
create index ix_dle_playerFiles__post_id_type on dle_playerFiles (post_id, type)
Caveats to this working efficiently:
Upvotes: 2