Patrick Burns
Patrick Burns

Reputation: 1833

How can i optimize update SQL query with "WHERE NOT IN (...more than 1000 items...)"?

I have table with this fields: ID, USERNAME, DATA, and IS_ONLINE. Each minute i receive new large part of data from remote server with current online users [(username, is_online, data), ...].

And i need set IS_ONLINE = 1 only for users in this array and for other users set IS_ONLINE = 0. In loop i making queries UPDATE users SET is_online = 1, data = 'data' WHERE username = 'username'. And then to the set IS_ONLINE = 0 i execute this query:

update users set is_online = 0 where username not in ('user1', 'user2', ... , 'user1000')

And it works well if i have small amount of users, but if users table contains more than 100 thousands lines this query executing more than 50 seconds.

Please can you give me answer or idea how i can increase speed of this simple query.

Thank you!

Upvotes: 0

Views: 136

Answers (1)

Rick James
Rick James

Reputation: 142518

Currently, the NOT IN query is touching the entire 100,000 rows. Let's avoid that by changing the query and the schema. Instead of setting is_online for all users, let's devise a way so that you only have to touch those that are currently online.

Instead of having is_online being a boolean, let's make it either a DATETIME or BIGINT. Then set it to a new, higher, value for those that are online, ignoring the others.

That will need one other bit of info -- the new high value. Simply store it in another table with a single row with a single column.

This schema avoids the timestamp kludges in the Comments. The only users that are "online" are those with the latest value in is_online; all others are offline.

Now let's check the rest of the schema. Do you really need both id and username? Toss id and make username the PRIMARY KEY; that will speed things up because it won't have to do a double lookup -- first to find the id, given the username, second to get to the row to UPDATE it. (Caveat: There may be issues if it is a FOREIGN KEY from another table. If so, please provide more details.)

Another tip -- Using 1000 statements to touch 1000 rows is a lot slower than using a single query. To do multiple updates of different values, use IODKU.

I'm hoping this will all boil down to a 3 sql statements:

BEGIN;
$hv = SELECT high_value FROM HighValue FOR UPDATE;
$hv++;
INSERT INTO MainTable
    VALUES
        ('user1', 'blah1', $hv),
        ('user2', 'blah2', $hv),
        ...
    ON DUPLICATE KEY UPDATE 
        data = VALUES(data),   -- sets the new `data` value
        is_online = $hv ;
UPDATE HighValue SET high_value = $hv;
COMMIT;

Upvotes: 1

Related Questions