Reputation: 5246
When I try to run below update query, It takes about 40 hours to complete. So I added a time limitation(Update query with time limitation). But still it takes nearly same time to complete.Is there any way to speed up this update?
EDIT: What I really want to do is only get logs between some specific dates and run this update query on this records.
create table user
(userid varchar(30));
create table logs
( log_time timestamp,
log_detail varchar(100),
userid varchar(30));
insert into user values('user1');
insert into user values('user2');
insert into user values('user3');
insert into user values('');
insert into logs values('no user mentioned','user3');
insert into logs values('inserted by user2','user2');
insert into logs values('inserted by user3',null);
Table before Update
log_time | log_detail | userid |
.. |-------------------|--------|
.. | no user mention | user3 |
.. | inserted by user2 | user2 |
.. | inserted by user3 | (null) |
Update query
update logs join user
set logs.userid=user.userid
where logs.log_detail LIKE concat("%",user.userID,"%") and user.userID != "";
Update query with time limitation
update logs join user
set logs.userid = IF (logs.log_time between '2015-08-11 00:39:41' AND '2015-08-01 17:39:44', user.userID, null)
where logs.log_detail LIKE concat("%",user.userID,"%") and user.userID != "";
Table after update
log_time | log_detail | userid |
.. |-------------------|--------|
.. | no user mentione | user3 |
.. | inserted by user2 | user2 |
.. | inserted by user3 | user3 |
EDIT: Original question Sql update statement with variable .
Upvotes: 0
Views: 117
Reputation: 429
You could add a new column called log_detail_reverse
where a trigger can be set so that when you insert a new row, you also insert the log_detail
column in reverse character order using the MySQL function reverse
. When you're doing your update query, you also reverse the userID search. The net effect is that you then transform your INDEX SCAN
to an INDEX SEEK
, which will be much faster.
update logs join user
set logs.userid=user.userid
where logs.log_time between '2015-08-01' and '2015-08-11'
and logs.log_detail_reverse LIKE concat(reverse(user.userID), '%')
The Trigger
could be something like:
DELIMITER //
CREATE TRIGGER log_details_in_reverse
AFTER INSERT
ON logs FOR EACH ROW
BEGIN
DECLARE reversedLogDetail varchar(100);
DECLARE rowId int; <-- you don't have a primary key in your example, but I'm assuming you do have one. If not, you should look into adding it.
-- Reverse the column log_detail and assign it to the declared variable
SELECT reverse(log_detail) INTO reversedLogDetail;
SELECT mysql_insert_id() INTO rowId;
-- Update record into logs table
UPDATE logs
SET log_detail_reverse = reversedLogDetail
WHERE log_id = rowId;
END; //
DELIMITER ;
Upvotes: 1
Reputation: 429
Log tables can easily fill up with tons of rows of data each month and even the best indexing won't help, especially in the case of a LIKE
operator. Your log_detail
column is 100 characters long and your search query is CONCAT("%",user.userID,"%")
. Using a function in a SQL command can slow things down because the function is doing extra computations. And what you're trying to search for is, if your userID is John, %John%
. So your query will scan every row in that table because indexes will be semi-useless. If you didn't have the first %
, then the query would be able to utilize its indexes efficiently. Your query would, in effect, do an INDEX SCAN
as opposed to an INDEX SEEK
.
For more information on these concepts, see:
Alright, what can you do about this? Two strategies.
Option 1 is to limit the number of rows that you're searching
through. You had the right idea using time limitations to reduce the
number of rows to search through. What I would suggest is to put the
time limitations as the first expression in your WHERE
clause.
Most databases execute the first expression first. So when
the second expression kicks in, it'll only scan through the rows returned by
the first expression.
update logs join user
set logs.userid=user.userid
where logs.log_time between '2015-08-01' and '2015-08-11'
and logs.log_detail LIKE concat('%',user.userID,'%')
Option 2 depends on your control of the database. If you have total
control (and you have the time and money, MySQL has a feature called
Auto-Sharding
. This is available in MySQL Cluster and MySQL
Fabric. I won't go over those products in much detail as the links
provided below can explain themselves much better than I could
summarize, but the idea behind Sharding
is to split the rows into
horizontal tables, so to speak. The idea behind it is that you're
not searching through a long database table, but instead across
several sister tables at the same time. Searching through 10 tables
of 10 million rows is faster than searching through 1 table of 100
million rows.
Upvotes: 1
Reputation: 95101
One thing about speeding up updates is not to update records that need no update. You only want to update records in a certain time range where the user doesn't match the user mentioned in the log text. Hence limit the records to be updated in your where clause.
update logs
set userid = substring_index(log_detail, ' ', -1)
where log_time between '2015-08-11 00:39:41' AND '2015-08-01 17:39:44'
and not userid <=> substring_index(log_detail, ' ', -1);
Upvotes: 0
Reputation: 1271111
First, the right place to put the time limitation is in the where
clause, not an if
:
update logs l left join
user u
on l.log_detail LIKE concat("%", u.userID)
set l.userid = u.userID
where l.log_time between '2015-08-11 00:39:41' AND '2015-08-01 17:39:44';
If you want to set the others to NULL
do this before:
update logs l
set l.userid = NULL
where l.log_time not between '2015-08-11 00:39:41' AND '2015-08-01 17:39:44';
But, if you really want this to be fast, you need to use an index for the join. It is possible that this will use an index on users(userid)
:
update logs l left join
user u
on cast(substring_index(l.log_detail, ' ', -1) as signed) = u.userID
set l.userid = u.userID
where l.log_time between '2015-08-11 00:39:41' AND '2015-08-01 17:39:44';
Look at the explain
on the equivalent select
. It is really important that the cast()
be to the same type as the UserId
.
Upvotes: 1