Reputation: 207922
There is a large table that holds millions of records. phpMyAdmin reports 1.2G size for the table.
There is a calculation that needs to be done for every row. The calculation is not simple (cannot be put in set col= calc format), it uses a stored function to get the values, so currently we have for each row a single update.
This is extremely slow and we want to optimize it.
Stored function:
https://gist.github.com/a9c2f9275644409dd19d
And this is called by this method for every row:
https://gist.github.com/82adfd97b9e5797feea6
This is performed on a off live server, and usually it is updated once per week.
What options we have here.
Upvotes: 1
Views: 184
Reputation: 6645
Well, the problem doesn't seem to be the UPDATE query because no calculations are performed in the query itself. As it seems the calculations are performed first and then the UPDATE query is run. So the UPDATE should be quick enough.
When you say "this is extremely slow", I assume you are not referring to the UPDATE query but the complete process. Here are some quick thoughts:
As you said there are millions of records, updating those many entries is always time consuming. And if there are many columns and indexes defined on the table, it will add to the overhead.
I see that there are many REPLACE INTO queries in the function getNumberOfPeople(). These might as well be a reason for the slow process. Have you checked how efficient are these REPLACE INTO queries? Can you try removing them and then see if it has any impact on the UPDATE process.
There are a couple of SELECT queries too in getNumberOfPeople(). Check if they might be impacting the process and if so, try optimizing them.
In procedure updateGPCD(), you may try replacing SELECT COUNT(*) INTO _has_breakdown
with SELECT COUNT(1) INTO _has_breakdown
. In the same query, the WHERE condition is reading _ACCOUNT but this will fail when _ACCOUNT = 0, no?
On another suggestion, if it is the UPDATE that you think is slow because of reason 1, it might make sense to move the updating column gpcd
outside usage_bill
to another table. The only other column in the table should be the unique ID from usage_bill
.
Hope the above make sense.
Upvotes: 0
Reputation: 3523
Why not setup a separate table to hold the computed values to take the load off your current table. It can have two columns: primary key for each row in your main table and a column for the computed value.
Then your process can be:
a) Truncate computedValues table - This is faster than trying to identify new rows
b) Compute the values and insert into the computed values table
c) So when ever you need your computed values you join to the computedValues table using a primary key join which is fast, and in case you need more computations well you just add new columns.
d) You can also update the main table using the computed values if you have to
Upvotes: 4