Guy Messika
Guy Messika

Reputation: 169

Select a record just if the one before it has a lower value takes too long and fail

I've already asked 2 questions about this subject and got great answers on both of them.

  1. Select a record just if the one before it has a lower value

  2. Select a record just if the one before it has a lower value filtered by month

(question 2 is currently not relevant to the subject cause I don't need to filter by month)

now ever since I implemented the code suggested to me, as long as my table kept growing the sql query took longer and longer. In the start it was taking for it 8 seconds to be executed on 1000 rows. now on a table with more than 25,000 rows it just fails.

you can watch my query right here - http://sqlfiddle.com/#!2/5c480/1/0

SELECT  a.ID, DATE_FORMAT(a.Time,'%d/%m/%y') AS T, a.SerialNumber, p.Model,
    b.Remain_Toner_Black BeforeCountBlack,
    a.Remain_Toner_Black AfterCountBlack,
    b.Remain_Toner_Cyan BeforeCountCyan,
    a.Remain_Toner_Cyan AfterCountCyan,
b.Remain_Toner_Magenta BeforeCountMagenta,
    a.Remain_Toner_Magenta AfterCountMagenta,
b.Remain_Toner_Yellow BeforeCountYellow,
    a.Remain_Toner_Yellow AfterCountYellow
FROM    
    (
        SELECT  a.ID, 
                a.Time, 
                a.SerialNumber, 
                a.Remain_Toner_Black,
                a.Remain_Toner_Cyan,
                a.Remain_Toner_Magenta,
                a.Remain_Toner_Yellow,
                (
                    SELECT  COUNT(*)
                    FROM    Reports c
                    WHERE   c.SerialNumber = a.SerialNumber AND
                            c.ID <= a.ID) AS RowNumber
        FROM    Reports a
    ) a
    LEFT JOIN
    (
        SELECT  a.ID, 
                a.Time, 
                a.SerialNumber, 
                a.Remain_Toner_Black,
                a.Remain_Toner_Cyan,
                a.Remain_Toner_Magenta,
                a.Remain_Toner_Yellow,
                (
                    SELECT  COUNT(*)
                    FROM    Reports c
                    WHERE   c.SerialNumber = a.SerialNumber AND
                            c.ID <= a.ID) AS RowNumber
        FROM    Reports a
    ) b ON a.SerialNumber = b.SerialNumber AND
            a.RowNumber = b.RowNumber + 1
INNER JOIN Printers p ON a.SerialNumber = p.SerialNumber
INNER JOIN Customers c ON p.IP = c.IP AND c.Company = 5
WHERE   (b.Remain_Toner_Black < a.Remain_Toner_Black AND b.Remain_Toner_Black >= 0) OR (b.Remain_Toner_Cyan < a.Remain_Toner_Cyan AND b.Remain_Toner_Cyan >= 0) OR (b.Remain_Toner_Magenta < a.Remain_Toner_Magenta AND b.Remain_Toner_Magenta >= 0) OR (b.Remain_Toner_Yellow < a.Remain_Toner_Yellow AND b.Remain_Toner_Yellow >= 0)

I need to work with the following 3 tables in order to select only the printers belong to a specific company which has an ID.

Reports:

ID         SerialNumber         Remain_Toner_Black   
29881      Z30PBAHBB00034E      58   
30001      Z30PBAHBB00034E      98
30200      Z30PBAHBB00034E      70
30205      BVCfdgdfgdf329F      50
30207      BVCfdgdfgdf329F      40
30210      Z30PBAHBB00034E      50
30301      Z30PBAHBB00034E      100

Printers:

IP                 SerialNumber         Customer 
80.179.228.81      Z30PBAHBB00034E      52

Customers:

ID         IP                      Company        
52         80.179.228.81           5

my query works perfectly and return:

ID         SerialNumber            BEFORECOUNTBLACK         AFTERCOUNTBLACK        
30001      Z30PBAHBB00034E         58                       98
30301      Z30PBAHBB00034E         50                       100

but again, it fails now when I'm running it on a table with 25,000 rows in the Reports table.

Upvotes: 3

Views: 122

Answers (1)

fancyPants
fancyPants

Reputation: 51868

Here's a solution for your question 1 which will run much faster, since you have many full table scans and dependent subqueries. Here you will at most have just one table scan (and maybe a temporary table, depending how large your data is and how much memory you've got). I think you can easily adjust it to your question here. Question 2 (I haven't read it really) is probably also answered since it's easy now to just add where date_column = whatever

select * from (
    select
    t.*,
    if(@prev_toner < Remain_Toner_Black and @prev_sn = SerialNumber, 1, 0) as select_it,
    @prev_sn := SerialNumber,
    @prev_toner := Remain_Toner_Black
    from
    Table1 t
    , (select @prev_toner:=0, @prev_sn:=SerialNumber from Table1 order by SerialNumber limit 1) var_init
    order by SerialNumber, id
) sq  
where select_it = 1

EDIT:

Explanation:

With this line

    , (select @prev_toner:=0, @prev_sn:=SerialNumber from Table1 order by SerialNumber 

we just initialize the variables @prev_toner and @prev_sn on the fly. It's the same as not having this line in the query at all but writing in front of the query

SET @prev_toner = 0;
SET @prev_sn = (select serialnumber from your_table order by serialnumber limit 1);
SELECT ...

So, why do the query to assign a value to @prev_sn and why order by serialnumber? The order by is very important. Without an order by there's no guaranteed order in which rows are returned. Also we will access the previous rows value with variables, so it's important that same serial numbers are "grouped together".

The columns in the select clause are evaluated one after another, so it's important that you first select this line

if(@prev_toner < Remain_Toner_Black and @prev_sn = SerialNumber, 1, 0) as select_it,

before you select these two lines

@prev_sn := SerialNumber,
@prev_toner := Remain_Toner_Black

Why is that? The last two lines assign just the values of the current rows to the variables. Therefor in this line

if(@prev_toner < Remain_Toner_Black and @prev_sn = SerialNumber, 1, 0) as select_it,

the variables still hold the values of the previous rows. And what we do here is nothing more than saying "if the previous rows value in column Remain_Toner_Black is smaller than the one in the current row and the previous rows serial number is the same as the actual rows serial number, return 1, else return 0."

Then we can simply say in the outer query "select every row, where the above returned 1".

Given your query, you don't need all these subqueries. They are very expensive and unnecessary. Actually it's quite insane. In this part of the query

    SELECT  a.ID, 
            a.Time, 
            a.SerialNumber, 
            a.Remain_Toner_Black,
            a.Remain_Toner_Cyan,
            a.Remain_Toner_Magenta,
            a.Remain_Toner_Yellow,
            (
                SELECT  COUNT(*)
                FROM    Reports c
                WHERE   c.SerialNumber = a.SerialNumber AND
                        c.ID <= a.ID) AS RowNumber
    FROM    Reports a

you select the whole table and for every row you count the rows within that group. That's a dependent subquery. All just to have some sort of row number. Then you do this a second time, just so you can join those two temporary tables to get the previous row. Really, no wonder the performance is horrible.

So, how to adjust my solution to your query? Instead of the one variable I used to get the previous row for Remain_Toner_Black use four for the colours black, cyan, magenta and yellow. And just join the Printers and Customers table like you did already. Don't forget the order by and you're done.

Upvotes: 3

Related Questions