hellzone
hellzone

Reputation: 5246

How to increase query speed with using like command

When I try to run below update query, It takes about 40 hours to complete. So I added a time limitation(Update query with time limitation). But still it takes nearly same time to complete.Is there any way to speed up this update?

EDIT: What I really want to do is only get logs between some specific dates and run this update query on this records.

create table user
(userid varchar(30));

create table logs
( log_time timestamp,
log_detail varchar(100),   
userid varchar(30));

insert into user values('user1');
insert into user values('user2');
insert into user values('user3');
insert into user values('');

insert into logs values('no user mentioned','user3');
insert into logs values('inserted by user2','user2');
insert into logs values('inserted by user3',null);

Table before Update

log_time |        log_detail | userid |
 ..      |-------------------|--------|
 ..      |   no user mention |  user3 |
 ..      | inserted by user2 |  user2 |
 ..      | inserted by user3 | (null) |

Update query

update logs join user
set logs.userid=user.userid
where logs.log_detail LIKE concat("%",user.userID,"%") and user.userID != "";

Update query with time limitation

update logs join user
set logs.userid = IF (logs.log_time between '2015-08-11 00:39:41' AND '2015-08-01 17:39:44', user.userID, null)
where logs.log_detail LIKE concat("%",user.userID,"%") and user.userID != "";

Table after update

log_time |        log_detail | userid |
   ..    |-------------------|--------|
   ..    |  no user mentione |  user3 |
   ..    | inserted by user2 |  user2 |
  ..     | inserted by user3 |  user3 |

EDIT: Original question Sql update statement with variable .

Upvotes: 0

Views: 117

Answers (4)

javaauthority
javaauthority

Reputation: 429

You could add a new column called log_detail_reverse where a trigger can be set so that when you insert a new row, you also insert the log_detail column in reverse character order using the MySQL function reverse. When you're doing your update query, you also reverse the userID search. The net effect is that you then transform your INDEX SCAN to an INDEX SEEK, which will be much faster.

update logs join user 
set logs.userid=user.userid 
where logs.log_time between '2015-08-01' and '2015-08-11' 
and logs.log_detail_reverse LIKE concat(reverse(user.userID), '%')

MySQL Trigger

The Trigger could be something like:

DELIMITER //

CREATE TRIGGER log_details_in_reverse
AFTER INSERT
ON logs FOR EACH ROW

BEGIN

DECLARE reversedLogDetail varchar(100);
DECLARE rowId int; <-- you don't have a primary key in your example, but I'm assuming you do have one.  If not, you should look into adding it.

-- Reverse the column log_detail and assign it to the declared variable
SELECT reverse(log_detail) INTO reversedLogDetail;
SELECT mysql_insert_id() INTO rowId;

-- Update record into logs table
UPDATE logs
SET log_detail_reverse = reversedLogDetail
WHERE log_id = rowId;

END; //

DELIMITER ;

Upvotes: 1

javaauthority
javaauthority

Reputation: 429

Log tables can easily fill up with tons of rows of data each month and even the best indexing won't help, especially in the case of a LIKE operator. Your log_detail column is 100 characters long and your search query is CONCAT("%",user.userID,"%"). Using a function in a SQL command can slow things down because the function is doing extra computations. And what you're trying to search for is, if your userID is John, %John%. So your query will scan every row in that table because indexes will be semi-useless. If you didn't have the first %, then the query would be able to utilize its indexes efficiently. Your query would, in effect, do an INDEX SCAN as opposed to an INDEX SEEK.

For more information on these concepts, see:

Index Seek VS Index Scan

Query tuning a LIKE operator

Alright, what can you do about this? Two strategies.

  • Option 1 is to limit the number of rows that you're searching through. You had the right idea using time limitations to reduce the number of rows to search through. What I would suggest is to put the time limitations as the first expression in your WHERE clause. Most databases execute the first expression first. So when the second expression kicks in, it'll only scan through the rows returned by the first expression.

    update logs join user 
    set logs.userid=user.userid 
    where logs.log_time between '2015-08-01' and '2015-08-11' 
    and logs.log_detail LIKE concat('%',user.userID,'%')
    
  • Option 2 depends on your control of the database. If you have total control (and you have the time and money, MySQL has a feature called Auto-Sharding. This is available in MySQL Cluster and MySQL Fabric. I won't go over those products in much detail as the links provided below can explain themselves much better than I could summarize, but the idea behind Sharding is to split the rows into horizontal tables, so to speak. The idea behind it is that you're not searching through a long database table, but instead across several sister tables at the same time. Searching through 10 tables of 10 million rows is faster than searching through 1 table of 100 million rows.

Database Sharding - Wikipedia

MySQL Cluster

MySQL Fabric

Upvotes: 1

Thorsten Kettner
Thorsten Kettner

Reputation: 95101

One thing about speeding up updates is not to update records that need no update. You only want to update records in a certain time range where the user doesn't match the user mentioned in the log text. Hence limit the records to be updated in your where clause.

update logs 
set userid = substring_index(log_detail, ' ', -1)
where log_time between '2015-08-11 00:39:41' AND '2015-08-01 17:39:44'
and not userid <=> substring_index(log_detail, ' ', -1);

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1271111

First, the right place to put the time limitation is in the where clause, not an if:

update logs l left join
       user u
       on l.log_detail LIKE concat("%", u.userID)
    set l.userid = u.userID
where l.log_time between '2015-08-11 00:39:41' AND '2015-08-01 17:39:44';

If you want to set the others to NULL do this before:

update logs l
     set l.userid = NULL
     where l.log_time not between '2015-08-11 00:39:41' AND '2015-08-01 17:39:44';

But, if you really want this to be fast, you need to use an index for the join. It is possible that this will use an index on users(userid):

update logs l left join
       user u
       on cast(substring_index(l.log_detail, ' ', -1) as signed) = u.userID
    set l.userid = u.userID
where l.log_time between '2015-08-11 00:39:41' AND '2015-08-01 17:39:44';

Look at the explain on the equivalent select. It is really important that the cast() be to the same type as the UserId.

Upvotes: 1

Related Questions