Tim Biegeleisen
Tim Biegeleisen

Reputation: 521987

Simulating LAG in MySQL during an update

Yesterday an interesting question was asked which required updating a MySQL table using the LAG. Consider the following input table (left), and the desired output (right):

**INPUT**                                 **OUTPUT**
ID  TestDate    PerformanceStatus (PS)    ID  TestDate    PS  PreviousPerformanceStatus
1   15/03/2016  0                         1   15/03/2016  0   0
1   01/04/2016  2                         1   01/04/2016  2   0
1   05/05/2016  1                         1   05/05/2016  1   2
1   07/06/2016  1                         1   07/06/2016  1   1
2   15/03/2016  0                         2   15/03/2016  0   1
2   01/04/2016  2                         2   01/04/2016  2   0
2   05/05/2016  1                         2   05/05/2016  1   2
2   07/06/2016  3                         2   07/06/2016  3   1
2   23/08/2016  1                         2   23/08/2016  1   3

In other words, the goal is to assign to PreviousPerformanceStatus the value which existed in the record coming before, as ordered by ID then TestDate.

The accepted answer, given by @spencer7593, used a correlated subquery. However, what popped into my head first was to use a user variable. Here is how I answered:

SET @lag = 0;
UPDATE yourTable
SET PreviousPerformanceStatus = @lag,
    @lag:=PerformanceStatus
ORDER BY ID, TestDate

I was told that this answer is unstable, but I was wondering if someone could explain why something might go wrong, what would be happening in that case, and finally what could we do to use a user variable here to simulate LAG.

It is my understanding that the following SELECT query would have no issues at all:

SELECT PerformanceStatus,
       @lag AS PreviousPerformanceStatus,
       @lag:=PerformanceStatus
FROM yourTable
ORDER BY ID, TestDate

However, when doing an UPDATE there are other considerations to take into account.

Upvotes: 0

Views: 990

Answers (2)

Misa
Misa

Reputation: 109

The accepted answer in that thread is wrong. The best way that I have found ant tested is to use a CTE (WITH clause) and update the table from the CTE result set.

The SELECT proposed by spencer7593 will work but is quite inneficient (in my case it took one minute in order to update three columns). The UPDATE will fail with an error telling you that you can not update a table that your are using in the FROM clause of the SET target.

enter image description here

Another, much more efficient way is to use a CTE with LAG() and to grab updated values from it. However, you will need a unique key to be used as the join key between the CTE and your table if there is no other simple or composite unique key.

-- Create the table as per question
drop table if exists student;

create table if not exists student (
pk int auto_increment,
id int not null,
TestDate date not null,
PerformanceStatus int not null,
PreviousPerformanceStatus int null default null,
primary key (pk)
) engine=innodb;

insert into student(id, TestDate, PerformanceStatus, PreviousPerformanceStatus)
values  (1, '2016-03-15', 0, null),
        (1, '2016-04-01', 2, null),
        (1, '2016-05-05', 1, null),
        (1, '2016-06-07', 1, null),
        (2, '2016-03-15', 0, null),
        (2, '2016-04-01', 2, null),
        (2, '2016-05-05', 1, null),
        (2, '2016-06-07', 3, null),
        (2, '2016-08-23', 1, null);


-- Update PreviousPerformanceStatus using lag()
with p as
(
    select pk, id, testdate, performancestatus, 
    LAG(performancestatus, 1, 0) OVER (
        PARTITION BY id
        ORDER BY id, testdate asc
    ) as PreviousPerformanceStatus
    from student
)
update student t
inner join p 
on p.pk = t.pk  
set     
t.PreviousPerformanceStatus = p.PreviousPerformanceStatus;

You can replace the third parameter of LAG() with null instead of zero. I found this solution to be the most efficient among several other working solutions.

Upvotes: 0

P.Salmon
P.Salmon

Reputation: 17655

I don't think you can set variables in an update statement. Here's my reasoning- Given this

drop table if exists t;

create table t (ID int, TestDate date,   PerformanceStatus int, previousperformancestatus int);
insert into t values
(1 ,  '2016-03-15' , 0, null),                         
(1 ,  '2016-04-01' , 2, null),                         
(1 ,  '2016-05-05' , 1, null),                         
(1 ,  '2016-06-07' , 1, null),                         
(2 ,  '2016-03-15' , 0, null),                         
(2 ,  '2016-04-01' , 2, null),                         
(2 ,  '2016-05-05' , 1, null),                         
(2 ,  '2016-06-07' , 3, null),                         
(2 ,  '2016-08-23' , 1, null)
;

This code fails

MariaDB [sandbox]> SET @lag = 0;
Query OK, 0 rows affected (0.00 sec)

MariaDB [sandbox]> UPDATE T
    -> SET previousPerformanceStatus = @lag ,
    ->     @lag:=PerformanceStatus
    -> ORDER BY ID, TestDate;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '@lag:=PerformanceStatus
ORDER BY ID, TestDate' at line 3

Commenting out @lag:=PerformanceStatus This code runs

MariaDB [sandbox]> SET @lag = 0;
Query OK, 0 rows affected (0.00 sec)

MariaDB [sandbox]> UPDATE T
    -> SET previousPerformanceStatus = @lag
    -> #,@lag:=PerformanceStatus
    -> ORDER BY ID, TestDate;
Query OK, 0 rows affected (0.01 sec)
Rows matched: 9  Changed: 0  Warnings: 0

Since code at least runs without error and the manual https://dev.mysql.com/doc/refman/5.7/en/update.html states "The SET clause indicates which columns to modify " my take on this is that you cannot set variables in an update statement so simulating lag is not possible using this method.

Upvotes: 1

Related Questions