Reputation: 169
I've already asked 2 questions about this subject and got great answers on both of them.
(question 2 is currently not relevant to the subject cause I don't need to filter by month)
now ever since I implemented the code suggested to me, as long as my table kept growing the sql query took longer and longer. In the start it was taking for it 8 seconds to be executed on 1000 rows. now on a table with more than 25,000 rows it just fails.
you can watch my query right here - http://sqlfiddle.com/#!2/5c480/1/0
SELECT a.ID, DATE_FORMAT(a.Time,'%d/%m/%y') AS T, a.SerialNumber, p.Model,
b.Remain_Toner_Black BeforeCountBlack,
a.Remain_Toner_Black AfterCountBlack,
b.Remain_Toner_Cyan BeforeCountCyan,
a.Remain_Toner_Cyan AfterCountCyan,
b.Remain_Toner_Magenta BeforeCountMagenta,
a.Remain_Toner_Magenta AfterCountMagenta,
b.Remain_Toner_Yellow BeforeCountYellow,
a.Remain_Toner_Yellow AfterCountYellow
FROM
(
SELECT a.ID,
a.Time,
a.SerialNumber,
a.Remain_Toner_Black,
a.Remain_Toner_Cyan,
a.Remain_Toner_Magenta,
a.Remain_Toner_Yellow,
(
SELECT COUNT(*)
FROM Reports c
WHERE c.SerialNumber = a.SerialNumber AND
c.ID <= a.ID) AS RowNumber
FROM Reports a
) a
LEFT JOIN
(
SELECT a.ID,
a.Time,
a.SerialNumber,
a.Remain_Toner_Black,
a.Remain_Toner_Cyan,
a.Remain_Toner_Magenta,
a.Remain_Toner_Yellow,
(
SELECT COUNT(*)
FROM Reports c
WHERE c.SerialNumber = a.SerialNumber AND
c.ID <= a.ID) AS RowNumber
FROM Reports a
) b ON a.SerialNumber = b.SerialNumber AND
a.RowNumber = b.RowNumber + 1
INNER JOIN Printers p ON a.SerialNumber = p.SerialNumber
INNER JOIN Customers c ON p.IP = c.IP AND c.Company = 5
WHERE (b.Remain_Toner_Black < a.Remain_Toner_Black AND b.Remain_Toner_Black >= 0) OR (b.Remain_Toner_Cyan < a.Remain_Toner_Cyan AND b.Remain_Toner_Cyan >= 0) OR (b.Remain_Toner_Magenta < a.Remain_Toner_Magenta AND b.Remain_Toner_Magenta >= 0) OR (b.Remain_Toner_Yellow < a.Remain_Toner_Yellow AND b.Remain_Toner_Yellow >= 0)
I need to work with the following 3 tables in order to select only the printers belong to a specific company which has an ID.
Reports:
ID SerialNumber Remain_Toner_Black
29881 Z30PBAHBB00034E 58
30001 Z30PBAHBB00034E 98
30200 Z30PBAHBB00034E 70
30205 BVCfdgdfgdf329F 50
30207 BVCfdgdfgdf329F 40
30210 Z30PBAHBB00034E 50
30301 Z30PBAHBB00034E 100
Printers:
IP SerialNumber Customer
80.179.228.81 Z30PBAHBB00034E 52
Customers:
ID IP Company
52 80.179.228.81 5
my query works perfectly and return:
ID SerialNumber BEFORECOUNTBLACK AFTERCOUNTBLACK
30001 Z30PBAHBB00034E 58 98
30301 Z30PBAHBB00034E 50 100
but again, it fails now when I'm running it on a table with 25,000 rows in the Reports
table.
Upvotes: 3
Views: 122
Reputation: 51868
Here's a solution for your question 1 which will run much faster, since you have many full table scans and dependent subqueries. Here you will at most have just one table scan (and maybe a temporary table, depending how large your data is and how much memory you've got). I think you can easily adjust it to your question here. Question 2 (I haven't read it really) is probably also answered since it's easy now to just add where date_column = whatever
select * from (
select
t.*,
if(@prev_toner < Remain_Toner_Black and @prev_sn = SerialNumber, 1, 0) as select_it,
@prev_sn := SerialNumber,
@prev_toner := Remain_Toner_Black
from
Table1 t
, (select @prev_toner:=0, @prev_sn:=SerialNumber from Table1 order by SerialNumber limit 1) var_init
order by SerialNumber, id
) sq
where select_it = 1
EDIT:
Explanation:
With this line
, (select @prev_toner:=0, @prev_sn:=SerialNumber from Table1 order by SerialNumber
we just initialize the variables @prev_toner
and @prev_sn
on the fly. It's the same as not having this line in the query at all but writing in front of the query
SET @prev_toner = 0;
SET @prev_sn = (select serialnumber from your_table order by serialnumber limit 1);
SELECT ...
So, why do the query to assign a value to @prev_sn and why order by serialnumber? The order by is very important. Without an order by there's no guaranteed order in which rows are returned. Also we will access the previous rows value with variables, so it's important that same serial numbers are "grouped together".
The columns in the select clause are evaluated one after another, so it's important that you first select this line
if(@prev_toner < Remain_Toner_Black and @prev_sn = SerialNumber, 1, 0) as select_it,
before you select these two lines
@prev_sn := SerialNumber,
@prev_toner := Remain_Toner_Black
Why is that? The last two lines assign just the values of the current rows to the variables. Therefor in this line
if(@prev_toner < Remain_Toner_Black and @prev_sn = SerialNumber, 1, 0) as select_it,
the variables still hold the values of the previous rows. And what we do here is nothing more than saying "if the previous rows value in column Remain_Toner_Black is smaller than the one in the current row and the previous rows serial number is the same as the actual rows serial number, return 1, else return 0."
Then we can simply say in the outer query "select every row, where the above returned 1".
Given your query, you don't need all these subqueries. They are very expensive and unnecessary. Actually it's quite insane. In this part of the query
SELECT a.ID,
a.Time,
a.SerialNumber,
a.Remain_Toner_Black,
a.Remain_Toner_Cyan,
a.Remain_Toner_Magenta,
a.Remain_Toner_Yellow,
(
SELECT COUNT(*)
FROM Reports c
WHERE c.SerialNumber = a.SerialNumber AND
c.ID <= a.ID) AS RowNumber
FROM Reports a
you select the whole table and for every row you count the rows within that group. That's a dependent subquery. All just to have some sort of row number. Then you do this a second time, just so you can join those two temporary tables to get the previous row. Really, no wonder the performance is horrible.
So, how to adjust my solution to your query? Instead of the one variable I used to get the previous row for Remain_Toner_Black use four for the colours black, cyan, magenta and yellow. And just join the Printers and Customers table like you did already. Don't forget the order by and you're done.
Upvotes: 3