Reputation: 855
This script takes forever to run and sometimes it even crash the whole server
shop_designs
has 3000+ entries and shop_tshirts
has over 50k entries
How can i optimize it for faster execution ?
$query = "SELECT * FROM shop_designs";
$res = mysql_query($query) or die(mysql_error());
while($row = mysql_fetch_array($res)) {
$design = $row["design_id"];
$rating = $row["rating"];
$hits = $row["hits"];
$retours = $row["returns"];
$sales = $row["sales"];
$plussales = $row["sales_plus"];
$featured = $row["featured"];
$created_timestamp = $row["created_timestamp"];
$returns = $row["returns"];
$wishlisted = $row["wishlisted"];
$vector = $row["vector"];
$sales = $sales + $plussales - $returns;
if ($sales > 10) {
$bonus = $sales * 100;
} else {
$bonus = $sales * 50;
}
$unan = 60 * 60 * 24 * 365;
$unjour = 60 * 60 * 24;
$age_du_design = time() - $created_timestamp;
$age_du_design_en_jours = $age_du_design / $unjour;
$age_du_design_an = $age_du_design / $unan;
if ($age_du_design < $unan) {
$age_du_design = $unan;
$age_du_design_an = "1";
}
if ($sales == "1") {
$sales_per_year = $sales / $age_du_design_an;
} else {
$sales_per_year = $sales / $age_du_design_an;
}
if ($sales < 1) {
$sales_per_year = "0";
}
$indicehits = ($hits / 1000) / 3;
$calculsales = $sales + $wishlisted - $retours;
$cote = $calculsales + $indicehits;
$cote = round($cote, 2);
if ($vector == "1") {
$cote * 1.25;
}
echo "Update D#$design.......";
$yquery = "UPDATE shop_designs SET rating='$cote', sales_per_year='$sales_per_year' WHERE design_id='$design'";
mysql_query($yquery) or die(mysql_error());
$rating_hits = $hits + $bonus;
$rating_hits = $rating_hits / 2;
$rating_hits = round($rating_hits);
$zyquery = "UPDATE shop_tshirts SET wishlisted='$wishlisted', rating='$cote', hits='$hits', rating_hits='$rating_hits',featured='$featured',tshirt_sales='$sales',sales_per_year='$sales_per_year' WHERE design_id='$design'";
mysql_query($zyquery) or die(mysql_error());
}
Upvotes: 0
Views: 223
Reputation: 108410
Processing each individual row, row by agonizing row (RBAR), is going to be slow. The design of this algorithm totally ignores the power of SQL to process sets of data.
How to redesign this for better performance: rewrite this to run fewer SQL UPDATE
statements.
Before we get to that... there's some bizarre logic in the code. For example, what is the result of this:
if ($vector == "1") {
$cote * 1.25;
}
We see $cote
gets multiplied by 1.25
, but the return from that operation isn't stored anywhere. The result is discarded.
Why do we need a conditional test here:
if ($sales == "1") {
$sales_per_year = $sales / $age_du_design_an;
} else {
$sales_per_year = $sales / $age_du_design_an;
}
So, if some condition is true, we assign a value to $sales_per_year
. Otherwise, we assign the exact same value to $sales_per_year
. Why do we need a conditional test?
And the whole rigmarole with the "age" in seconds, age in hours, age in years... that essentially boils down to returning the greatest of a) one year, or b) the calculated age in years.
The computation of $calculsales
subtracts $retours
(aka $returns
) from $sales
. That's not invalid, but it's kind of curious, because $returns
has previously been subtracted from $sales
.
With those issues aside, I don't see any operations here that can't be performed in SQL expressions, within a SQL statement.
For example:
SELECT n.*
FROM ( SELECT v.design_id
, v.wishlisted
, v.rating
, v.hits
, ROUND(v.hits + (v.sales * IF(v.sales > 10, 100, 50)) AS rating_hits
, v.featured
, v.sales
, IF(v.sales < 1, 0, v.sales / v.age_du_design_an) AS sales_per_year
, ROUND(v.sales + v.wishlisted - v.returns + v.hits/3000,2)
-- * CASE WHEN v.vector = 1 THEN 1.25 ELSE 1.00 END
AS cote
FROM (
SELECT d.design_id
, d.rating
, d.hits
, d.featured
, (d.sales + d.plussales - d.returns) AS sales
, GREATEST((UNIX_TIMESTAMP(NOW())-s.created_timestamp)/(60*60*24*365),1) AS age_du_design_an
, d.returns
, d.wishlisted
, d.vector
FROM shop_designs d
) v
) n
But instead of fetching individual rows, and then issuing a bloatload of individual update statements, one for each design_id
, we can do a JOIN
operation between the query above, and the target table to be updated.
We can write that as a SELECT statement first, to test it. And then convert it to an UPDATE statement, something like this:
UPDATE shop_tshirts t
JOIN ( SELECT v.design_id
, v.wishlisted
, v.rating
, v.hits
, ROUND(v.hits + (v.sales * IF(v.sales > 10, 100, 50)) AS rating_hits
, v.featured
, v.sales
, IF(v.sales < 1, 0, v.sales / v.age_du_design_an) AS sales_per_year
, ROUND(v.sales + v.wishlisted - v.returns + v.hits/3000,2)
-- * CASE WHEN v.vector = 1 THEN 1.25 ELSE 1.00 END
AS cote
FROM (
SELECT d.design_id
, d.rating
, d.hits
, d.featured
, (d.sales + d.plussales - d.returns) AS sales
, GREATEST((UNIX_TIMESTAMP(NOW())-s.created_timestamp)/(60*60*24*365),1) AS age_du_design_an
, d.returns
, d.wishlisted
, d.vector
FROM shop_designs d
) v
) n
ON n.design_id = t.design_id
SET t.wishlisted = n.wishlisted
, t.rating = n.cote
, t.hits = n.hits
, t.rating_hits = n.rating_hits
, t.featured = n.featured
, t.tshirt_sales = n.sales
, t.sales_per_year = n.sales_per_year
There's only one execution of that statement required, it will update all the rows in shop_tshirts
in one fell swoop. We can do a similar operation for the other table as well.
That's how we get improved performance.
FOLLOWUP
If you don't process the UPDATE as a set, and instead process row by agonizing row (RBAR), then be sure that you have suitable index defined on the shop_tshirts
and shop_designs
tables, with design_id
as the leading column.
Upvotes: 2
Reputation: 4625
I meant something like this
<?php
$query = "SELECT * FROM shop_designs";
$res = mysql_query($query) or die(mysql_error());
while($row = mysql_fetch_array($res)) {
.
.
.
echo "Update D#$design.......";
$query += "UPDATE shop_designs SET rating='$cote', sales_per_year='$sales_per_year' WHERE design_id='$design';";
.
.
.
$query += "UPDATE shop_tshirts SET wishlisted='$wishlisted', rating='$cote', hits='$hits', rating_hits='$rating_hits',featured='$featured',tshirt_sales='$sales',sales_per_year='$sales_per_year' WHERE design_id='$design';";
}
mysql_query($query) or die(mysql_error());
In you previous implementation php will access the db connection 6000+ times, but now php need to access it only two times... first to select the query and second to bulk update query.
[EDIT]
Also $unan = 60 * 60 * 24 * 365;
$unjour = 60 * 60 * 24;
and time()
can be put outside the loop. It will not create a noticeable memory difference, but it is more correct that way.
Upvotes: 0