Reputation: 521987
Yesterday an interesting question was asked which required updating a MySQL table using the LAG. Consider the following input table (left), and the desired output (right):
**INPUT** **OUTPUT**
ID TestDate PerformanceStatus (PS) ID TestDate PS PreviousPerformanceStatus
1 15/03/2016 0 1 15/03/2016 0 0
1 01/04/2016 2 1 01/04/2016 2 0
1 05/05/2016 1 1 05/05/2016 1 2
1 07/06/2016 1 1 07/06/2016 1 1
2 15/03/2016 0 2 15/03/2016 0 1
2 01/04/2016 2 2 01/04/2016 2 0
2 05/05/2016 1 2 05/05/2016 1 2
2 07/06/2016 3 2 07/06/2016 3 1
2 23/08/2016 1 2 23/08/2016 1 3
In other words, the goal is to assign to PreviousPerformanceStatus
the value which existed in the record coming before, as ordered by ID
then TestDate
.
The accepted answer, given by @spencer7593, used a correlated subquery. However, what popped into my head first was to use a user variable. Here is how I answered:
SET @lag = 0;
UPDATE yourTable
SET PreviousPerformanceStatus = @lag,
@lag:=PerformanceStatus
ORDER BY ID, TestDate
I was told that this answer is unstable, but I was wondering if someone could explain why something might go wrong, what would be happening in that case, and finally what could we do to use a user variable here to simulate LAG.
It is my understanding that the following SELECT
query would have no issues at all:
SELECT PerformanceStatus,
@lag AS PreviousPerformanceStatus,
@lag:=PerformanceStatus
FROM yourTable
ORDER BY ID, TestDate
However, when doing an UPDATE
there are other considerations to take into account.
Upvotes: 0
Views: 990
Reputation: 109
The accepted answer in that thread is wrong. The best way that I have found ant tested is to use a CTE (WITH clause) and update the table from the CTE result set.
The SELECT proposed by spencer7593 will work but is quite inneficient (in my case it took one minute in order to update three columns). The UPDATE will fail with an error telling you that you can not update a table that your are using in the FROM clause of the SET target.
Another, much more efficient way is to use a CTE with LAG() and to grab updated values from it. However, you will need a unique key to be used as the join key between the CTE and your table if there is no other simple or composite unique key.
-- Create the table as per question
drop table if exists student;
create table if not exists student (
pk int auto_increment,
id int not null,
TestDate date not null,
PerformanceStatus int not null,
PreviousPerformanceStatus int null default null,
primary key (pk)
) engine=innodb;
insert into student(id, TestDate, PerformanceStatus, PreviousPerformanceStatus)
values (1, '2016-03-15', 0, null),
(1, '2016-04-01', 2, null),
(1, '2016-05-05', 1, null),
(1, '2016-06-07', 1, null),
(2, '2016-03-15', 0, null),
(2, '2016-04-01', 2, null),
(2, '2016-05-05', 1, null),
(2, '2016-06-07', 3, null),
(2, '2016-08-23', 1, null);
-- Update PreviousPerformanceStatus using lag()
with p as
(
select pk, id, testdate, performancestatus,
LAG(performancestatus, 1, 0) OVER (
PARTITION BY id
ORDER BY id, testdate asc
) as PreviousPerformanceStatus
from student
)
update student t
inner join p
on p.pk = t.pk
set
t.PreviousPerformanceStatus = p.PreviousPerformanceStatus;
You can replace the third parameter of LAG() with null instead of zero. I found this solution to be the most efficient among several other working solutions.
Upvotes: 0
Reputation: 17655
I don't think you can set variables in an update statement. Here's my reasoning- Given this
drop table if exists t;
create table t (ID int, TestDate date, PerformanceStatus int, previousperformancestatus int);
insert into t values
(1 , '2016-03-15' , 0, null),
(1 , '2016-04-01' , 2, null),
(1 , '2016-05-05' , 1, null),
(1 , '2016-06-07' , 1, null),
(2 , '2016-03-15' , 0, null),
(2 , '2016-04-01' , 2, null),
(2 , '2016-05-05' , 1, null),
(2 , '2016-06-07' , 3, null),
(2 , '2016-08-23' , 1, null)
;
This code fails
MariaDB [sandbox]> SET @lag = 0;
Query OK, 0 rows affected (0.00 sec)
MariaDB [sandbox]> UPDATE T
-> SET previousPerformanceStatus = @lag ,
-> @lag:=PerformanceStatus
-> ORDER BY ID, TestDate;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '@lag:=PerformanceStatus
ORDER BY ID, TestDate' at line 3
Commenting out @lag:=PerformanceStatus This code runs
MariaDB [sandbox]> SET @lag = 0;
Query OK, 0 rows affected (0.00 sec)
MariaDB [sandbox]> UPDATE T
-> SET previousPerformanceStatus = @lag
-> #,@lag:=PerformanceStatus
-> ORDER BY ID, TestDate;
Query OK, 0 rows affected (0.01 sec)
Rows matched: 9 Changed: 0 Warnings: 0
Since code at least runs without error and the manual https://dev.mysql.com/doc/refman/5.7/en/update.html states "The SET clause indicates which columns to modify " my take on this is that you cannot set variables in an update statement so simulating lag is not possible using this method.
Upvotes: 1