libertaire
libertaire

Reputation: 855

optimizing repetitive mysql queries

This script takes forever to run and sometimes it even crash the whole server

shop_designs has 3000+ entries and shop_tshirts has over 50k entries

How can i optimize it for faster execution ?

$query = "SELECT * FROM shop_designs";
$res = mysql_query($query) or die(mysql_error());
while($row = mysql_fetch_array($res)) {
$design = $row["design_id"];
$rating = $row["rating"];
$hits = $row["hits"];
$retours = $row["returns"];
$sales = $row["sales"];
$plussales = $row["sales_plus"];
$featured = $row["featured"];
$created_timestamp = $row["created_timestamp"];
$returns = $row["returns"];
$wishlisted = $row["wishlisted"];
$vector = $row["vector"];
$sales = $sales + $plussales - $returns;

    if ($sales > 10) {
    $bonus = $sales * 100;
    } else {
        $bonus = $sales * 50;
    }

$unan = 60 * 60 * 24 * 365;
$unjour = 60 * 60 * 24;
$age_du_design = time() - $created_timestamp;
$age_du_design_en_jours = $age_du_design / $unjour;
$age_du_design_an = $age_du_design / $unan;
    if ($age_du_design < $unan) {
    $age_du_design = $unan;
    $age_du_design_an = "1";
    }

if ($sales == "1") {
$sales_per_year = $sales / $age_du_design_an;
} else {
$sales_per_year = $sales / $age_du_design_an;
}
    if ($sales < 1) {
    $sales_per_year = "0";
    }

$indicehits = ($hits / 1000) / 3;
$calculsales = $sales + $wishlisted - $retours;
$cote = $calculsales + $indicehits;
$cote = round($cote, 2);
if ($vector == "1") {
$cote * 1.25;
}


echo "Update D#$design.......";
$yquery = "UPDATE shop_designs SET rating='$cote', sales_per_year='$sales_per_year' WHERE design_id='$design'";
mysql_query($yquery) or die(mysql_error());

$rating_hits = $hits + $bonus;
$rating_hits = $rating_hits / 2;
$rating_hits = round($rating_hits);
$zyquery = "UPDATE shop_tshirts SET wishlisted='$wishlisted', rating='$cote', hits='$hits', rating_hits='$rating_hits',featured='$featured',tshirt_sales='$sales',sales_per_year='$sales_per_year' WHERE design_id='$design'";
mysql_query($zyquery) or die(mysql_error());

}

Upvotes: 0

Views: 223

Answers (2)

spencer7593
spencer7593

Reputation: 108410

Processing each individual row, row by agonizing row (RBAR), is going to be slow. The design of this algorithm totally ignores the power of SQL to process sets of data.

How to redesign this for better performance: rewrite this to run fewer SQL UPDATE statements.


Before we get to that... there's some bizarre logic in the code. For example, what is the result of this:

 if ($vector == "1") {
 $cote * 1.25;
 }

We see $cote gets multiplied by 1.25, but the return from that operation isn't stored anywhere. The result is discarded.

Why do we need a conditional test here:

 if ($sales == "1") {
 $sales_per_year = $sales / $age_du_design_an;
 } else {
 $sales_per_year = $sales / $age_du_design_an;
 }

So, if some condition is true, we assign a value to $sales_per_year. Otherwise, we assign the exact same value to $sales_per_year. Why do we need a conditional test?

And the whole rigmarole with the "age" in seconds, age in hours, age in years... that essentially boils down to returning the greatest of a) one year, or b) the calculated age in years.

The computation of $calculsales subtracts $retours (aka $returns) from $sales. That's not invalid, but it's kind of curious, because $returns has previously been subtracted from $sales.


With those issues aside, I don't see any operations here that can't be performed in SQL expressions, within a SQL statement.

For example:

 SELECT n.*
   FROM ( SELECT v.design_id
               , v.wishlisted
               , v.rating
               , v.hits
               , ROUND(v.hits + (v.sales * IF(v.sales > 10, 100, 50)) AS rating_hits
               , v.featured
               , v.sales
               , IF(v.sales < 1, 0, v.sales / v.age_du_design_an) AS sales_per_year
               , ROUND(v.sales + v.wishlisted - v.returns + v.hits/3000,2)
                 -- * CASE WHEN v.vector = 1 THEN 1.25 ELSE 1.00 END
                 AS cote
            FROM (
                   SELECT d.design_id
                        , d.rating
                        , d.hits
                        , d.featured
                        , (d.sales + d.plussales - d.returns) AS sales
                        , GREATEST((UNIX_TIMESTAMP(NOW())-s.created_timestamp)/(60*60*24*365),1) AS age_du_design_an
                        , d.returns
                        , d.wishlisted
                        , d.vector
                     FROM shop_designs d
                 ) v
        ) n

But instead of fetching individual rows, and then issuing a bloatload of individual update statements, one for each design_id, we can do a JOIN operation between the query above, and the target table to be updated.

We can write that as a SELECT statement first, to test it. And then convert it to an UPDATE statement, something like this:

 UPDATE shop_tshirts t
   JOIN ( SELECT v.design_id
               , v.wishlisted
               , v.rating
               , v.hits
               , ROUND(v.hits + (v.sales * IF(v.sales > 10, 100, 50)) AS rating_hits
               , v.featured
               , v.sales
               , IF(v.sales < 1, 0, v.sales / v.age_du_design_an) AS sales_per_year
               , ROUND(v.sales + v.wishlisted - v.returns + v.hits/3000,2)
                 -- * CASE WHEN v.vector = 1 THEN 1.25 ELSE 1.00 END
                 AS cote
            FROM (
                   SELECT d.design_id
                        , d.rating
                        , d.hits
                        , d.featured
                        , (d.sales + d.plussales - d.returns) AS sales
                        , GREATEST((UNIX_TIMESTAMP(NOW())-s.created_timestamp)/(60*60*24*365),1) AS age_du_design_an
                        , d.returns
                        , d.wishlisted
                        , d.vector
                     FROM shop_designs d
                 ) v
        ) n
     ON n.design_id = t.design_id
    SET t.wishlisted     = n.wishlisted
      , t.rating         = n.cote
      , t.hits           = n.hits
      , t.rating_hits    = n.rating_hits
      , t.featured       = n.featured
      , t.tshirt_sales   = n.sales
      , t.sales_per_year = n.sales_per_year 

There's only one execution of that statement required, it will update all the rows in shop_tshirts in one fell swoop. We can do a similar operation for the other table as well.

That's how we get improved performance.


FOLLOWUP

If you don't process the UPDATE as a set, and instead process row by agonizing row (RBAR), then be sure that you have suitable index defined on the shop_tshirts and shop_designs tables, with design_id as the leading column.

Upvotes: 2

Nandakumar V
Nandakumar V

Reputation: 4625

I meant something like this

<?php 
$query = "SELECT * FROM shop_designs";
$res = mysql_query($query) or die(mysql_error());
while($row = mysql_fetch_array($res)) {
.
.
.
echo "Update D#$design.......";
$query += "UPDATE shop_designs SET rating='$cote', sales_per_year='$sales_per_year' WHERE design_id='$design';";
.
.
.

$query += "UPDATE shop_tshirts SET wishlisted='$wishlisted', rating='$cote', hits='$hits', rating_hits='$rating_hits',featured='$featured',tshirt_sales='$sales',sales_per_year='$sales_per_year' WHERE design_id='$design';";
}

mysql_query($query) or die(mysql_error());  

In you previous implementation php will access the db connection 6000+ times, but now php need to access it only two times... first to select the query and second to bulk update query.

[EDIT]
Also $unan = 60 * 60 * 24 * 365; $unjour = 60 * 60 * 24; and time() can be put outside the loop. It will not create a noticeable memory difference, but it is more correct that way.

Upvotes: 0

Related Questions