kentor
kentor

Reputation: 18524

How to handle/optimize thousands of different to executed SELECT queries?

I need to synchronize specific information between two databases (one mysql, the other a remote hosted SQL Server database) for thousands of rows. When I execute this php file it gets stuck/timeouts after several minutes I guess, so I wonder how I can fix this issue and maybe also optimize the way of "synchronizing" it.

What the code needs to do:

Basically I want to get for every row (= one account) in my database which gets updated - two specific pieces of information (= 2 SELECT queries) from another SQL Server database. Therefore I use a foreach loop which creates 2 SQL queries for each row and afterwards I update those information into 2 columns of this row. We talk about ~10k Rows which needs to run thru this foreach loop.

My idea which may help?

I have heard about things like PDO Transactions which should collect all those queries and sending them afterwards in a package of all SELECT queries, but I have no idea whether I use them correctly or whether they even help in such cases.

This is my current code, which is timing out after few minutes:

// DBH => MSSQL DB | DB => MySQL DB
$dbh->beginTransaction();
// Get all referral IDs which needs to be updated:
$listAccounts = "SELECT * FROM Gifting WHERE refsCompleted <= 100 ORDER BY idGifting ASC";
$ps_listAccounts = $db->prepare($listAccounts);
$ps_listAccounts->execute();

foreach($ps_listAccounts as $row) {
    $refid=$row['refId'];
    // Refsinserted
    $refsInserted = "SELECT count(username) as done FROM accounts WHERE referral='$refid'";
    $ps_refsInserted = $dbh->prepare($refsInserted);
    $ps_refsInserted->execute();
    $row = $ps_refsInserted->fetch();
    $refsInserted = $row['done'];

    // Refscompleted
    $refsCompleted = "SELECT count(username) as done FROM accounts WHERE referral='$refid' AND finished=1";
    $ps_refsCompleted = $dbh->prepare($refsCompleted);
    $ps_refsCompleted->execute();
    $row2 = $ps_refsCompleted->fetch();
    $refsCompleted = $row2['done'];

    // Update fields for local order db
    $updateGifting = "UPDATE Gifting SET refsInserted = :refsInserted, refsCompleted = :refsCompleted WHERE refId = :refId";
    $ps_updateGifting = $db->prepare($updateGifting);

    $ps_updateGifting->bindParam(':refsInserted', $refsInserted);
    $ps_updateGifting->bindParam(':refsCompleted', $refsCompleted);
    $ps_updateGifting->bindParam(':refId', $refid);
    $ps_updateGifting->execute();
    echo "$refid: $refsInserted Refs inserted / $refsCompleted Refs completed<br>";
}

$dbh->commit();

Upvotes: 0

Views: 369

Answers (1)

Jonathan Kuhn
Jonathan Kuhn

Reputation: 15301

You can do all of that in one query with a correlated sub-query:

UPDATE Gifting
SET
    refsInserted=(SELECT COUNT(USERNAME)
                    FROM accounts
                    WHERE referral=Gifting.refId),
    refsCompleted=(SELECT COUNT(USERNAME)
                    FROM accounts
                    WHERE referral=Gifting.refId
                        AND finished=1)

A correlated sub-query is essentially using a sub-query (query within a query) that references the parent query. So notice that in each of the sub-queries I am referencing the Gifting.refId column in the where clause of each sub-query. While this isn't the best for performance because each of those sub-queries still has to run independent of the other queries, it would perform much better (and likely as good as you are going to get) than what you have there.

Edit:

And just for reference. I don't know if a transaction will help here at all. Typically they are used when you have several queries that depend on each other and to give you a way to rollback if one fails. For example, banking transactions. You don't want the balance to deduct some amount until a purchase has been inserted. And if the purchase fails inserting for some reason, you want to rollback the change to the balance. So when inserting a purchase, you start a transaction, run the update balance query and the insert purchase query and only if both go in correctly and have been validated do you commit to save.

Edit2:

If I were doing this, without doing an export/import this is what I would do. This makes a few assumptions though. First is that you are using a mssql 2008 or newer and second is that the referral id is always a number. I'm also using a temp table that I insert numbers into because you can insert multiple rows easily with a single query and then run a single update query to update the gifting table. This temp table follows the structure CREATE TABLE tempTable (refId int, done int, total int).

//get list of referral accounts
//if you are using one column, only query for one column
$listAccounts = "SELECT DISTINCT refId FROM Gifting WHERE refsCompleted <= 100 ORDER BY idGifting ASC";
$ps_listAccounts = $db->prepare($listAccounts);
$ps_listAccounts->execute();

//loop over and get list of refIds from above.
$refIds = array();
foreach($ps_listAccounts as $row){
    $refIds[] = $row['refId'];
}


if(count($refIds) > 0){
    //implode into string for use in query below
    $refIds = implode(',',$refIds);

    //select out total count
    $totalCount = "SELECT referral, COUNT(username) AS cnt FROM accounts WHERE referral IN ($refIds) GROUP BY referral";
    $ps_totalCounts = $dbh->prepare($totalCount);
    $ps_totalCounts->execute();

    //add to array of counts
    $counts = array();

    //loop over total counts
    foreach($ps_totalCounts as $row){
        //if referral id not found, add it
        if(!isset($counts[$row['referral']])){
            $counts[$row['referral']] = array('total'=>0,'done'=>0);
        }
        //add to count
        $counts[$row['referral']]['total'] += $row['cnt'];
    }

    $doneCount = "SELECT referral, COUNT(username) AS cnt FROM accounts WHERE finished=1 AND referral IN ($refIds) GROUP BY referral";
    $ps_doneCounts = $dbh->prepare($doneCount);
    $ps_doneCounts->execute();

    //loop over total counts
    foreach($ps_totalCounts as $row){
        //if referral id not found, add it
        if(!isset($counts[$row['referral']])){
            $counts[$row['referral']] = array('total'=>0,'done'=>0);
        }
        //add to count
        $counts[$row['referral']]['done'] += $row['cnt'];
    }

    //now loop over counts and generate insert queries to a temp table.
    //I suggest using a temp table because you can insert multiple rows
    //in one query and then the update is one query.
    $sqlInsertList = array();
    foreach($count as $refId=>$count){
        $sqlInsertList[] = "({$refId}, {$count['done']}, {$count['total']})";
    }

    //clear out the temp table first so we are only inserting new rows
    $truncSql = "TRUNCATE TABLE tempTable";
    $ps_trunc = $db->prepare($truncSql);
    $ps_trunc->execute();

    //make insert sql with multiple insert rows
    $insertSql = "INSERT INTO tempTable (refId, done, total) VALUES ".implode(',',$sqlInsertList);
    //prepare sql for insert into mssql
    $ps_insert = $db->prepare($insertSql);
    $ps_insert->execute();

    //sql to update existing rows
    $updateSql = "UPDATE Gifting
                    SET refsInserted=(SELECT total FROM tempTable WHERE refId=Gifting.refId),
                        refsCompleted=(SELECT done FROM tempTable WHERE refId=Gifting.refId)
                    WHERE refId IN (SELECT refId FROM tempTable)
                        AND refsCompleted <= 100";
    $ps_update = $db->prepare($updateSql);
    $ps_update->execute();
} else {
    echo "There were no reference ids found from \$dbh";
}

Upvotes: 2

Related Questions