Reputation: 263
I got myself into a MySQL design scalability issue. Any help would be greatly appreciated.
The requirements:
Storing users' SOCIAL_GRAPH and USER_INFO about each user in their social graph. Many concurrent reads and writes per second occur. Dirty reads acceptable.
Current design:
We have 2 (relevant) tables. Both InnoDB for row locking, instead of table locking.
USER_SOCIAL_GRAPH table that maps a logged in (user_id) to another (related_user_id). PRIMARY key composite user_id and related_user_id.
USER_INFO table with information about each related user. PRIMARY key is (related_user_id).
Note 1: No relationships defined.
Note 2: Each table is now about 1GB in size, with 8 million and 2 million records, respectively.
Simplified table SQL creates:
CREATE TABLE `user_social_graph` (
`user_id` int(10) unsigned NOT NULL,
`related_user_id` int(11) NOT NULL,
PRIMARY KEY (`user_id`,`related_user_id`),
KEY `user_idx` (`user_id`)
) ENGINE=InnoDB;
CREATE TABLE `user_info` (
`related_user_id` int(10) unsigned NOT NULL,
`screen_name` varchar(20) CHARACTER SET latin1 DEFAULT NULL,
[... and many other non-indexed fields irrelevant]
`last_updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`related_user_id`),
KEY `last_updated_idx` (`last_updated`)
) ENGINE=InnoDB;
MY.CFG values set:
innodb_buffer_pool_size = 256M
key_buffer_size = 320M
Note 3: Memory available 1GB, these 2 tables are 2GBs, other innoDB tables 3GB.
Problem:
The following example SQL statement, which needs to access all records found, takes 15 seconds to execute (!!) and num_results = 220,000:
SELECT SQL_NO_CACHE COUNT(u.related_user_id)
FROM user_info u LEFT JOIN user_socialgraph u2 ON u.related_user_id = u2.related_user_id
WHERE u2.user_id = '1'
AND u.related_user_id = u2.related_user_id
AND (NOT (u.related_user_id IS NULL));
For a user_id with a count of 30,000, it takes about 3 seconds (!).
EXPLAIN EXTENDED for the 220,000 count user. It uses indices:
+----+-------------+-------+--------+------------------------+----------+---------+--------------------+--------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+--------+------------------------+----------+---------+--------------------+--------+----------+--------------------------+
| 1 | SIMPLE | u2 | ref | user_user_idx,user_idx | user_idx | 4 | const | 157320 | 100.00 | Using where |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | u2.related_user_id | 1 | 100.00 | Using where; Using index |
+----+-------------+-------+--------+------------------------+----------+---------+--------------------+--------+----------+--------------------------+
How do we speed these up without setting innodb_buffer_pool_size to 5GB?
Thank you!
Upvotes: 2
Views: 982
Reputation: 423
What is the MySQL version? Its manual contains important information for speeding up statements and code in general;
Change your paradigm to a data warehouse capable to manage till terabyte table. Migrate your legacy MySQL data base with free tool or application to the new paradigm. This is an example: http://www.infobright.org/Downloads/What-is-ICE/ many others (free and commercial).
PostgreSQL is not commercial and there a lot of tools to migrate MySQL to it!
Upvotes: 0
Reputation: 44333
The user_social_graph table is not indexed correctly !!!
You have ths:
CREATE TABLE user_social_graph
(user_id
int(10) unsigned NOT NULL,
related_user_id
int(11) NOT NULL,
PRIMARY KEY (user_id
,related_user_id
),
KEY user_idx
(user_id
))
ENGINE=InnoDB;
The second index is redundant since the first column is user_id. You are attempting to join the related_user_id column over to the user_info table. That column needed to be indexed.
Change user_social_graphs as follows:
CREATE TABLE user_social_graph
(user_id
int(10) unsigned NOT NULL,
related_user_id
int(11) NOT NULL,
PRIMARY KEY (user_id
,related_user_id
),
UNIQUE KEY related_user_idx
(related_user_id
,user_id
))
ENGINE=InnoDB;
This should change the EXPLAIN PLAN. Keep in mind that the index order matters depending the the way you query the columns.
Give it a Try !!!
Upvotes: 1