Dennis Hunink
Dennis Hunink

Reputation: 583

Query on large mysql database

i've got a script which is supposed to run through a mysql database and preform a certain 'test'on the cases. Simplified the database contains records which represent trips that have been made by persons. Each record is a singel trip. But I want to use only roundway trips. So I need to search the database and match two trips to each other; the trip to and the trip from a certain location.

The script is working fine. The problem is that the database contains more then 600.000 cases. I know this should be avoided if possible. But for the purpose of this script and the use of the database records later on, everything has to stick together.

Executing the script takes hours right now, when executing on my iMac using MAMP. Off course I made sure that it can use a lot of memory etcetare.

My question is how could I speed things up, what's the best approach to do this?

Here's the script I have right now:

$table          = $_GET['table'];                  
$output = '';                  
//Select all cases that has not been marked as invalid in previous test         
$query = "SELECT persid, ritid, vertpc, aankpc, jaar, maand, dag FROM MON.$table WHERE reasonInvalid != '1' OR reasonInvalid IS NULL";         
$result = mysql_query($query)or die($output .= mysql_error());                    
$totalCountValid = '';         
$totalCountInvalid = '';         
$totalCount = '';                  
//For each record:         
while($row = mysql_fetch_array($result)){                 
    $totalCount += 1;                 
    //Do another query, get all the rows for this persons ID and that share postal codes. Postal codes revert between the two trips                 
   $persid                 = $row['persid'];                 
   $ritid                  = $row['ritid'];                 
   $pcD                    = $row['vertpc'];                 
   $pcA                    = $row['aankpc'];                 
   $jaar                   = $row['jaar'];                 
   $maand                  = $row['maand'];                 
   $dag                    = $row['dag'];         
   $thecountquery  = "SELECT * FROM MON.$table WHERE persid=$persid AND vertpc=$pcA AND aankpc=$pcD AND jaar = $jaar AND maand = $maand AND dag = $dag";                  
   $thecount               = mysql_num_rows(mysql_query($thecountquery));                 
   if($thecount >= 1){                         
      //No worries, this person ID has multiple trips attached                            
      $totalCountValid += 1;                 
   }else{                         
      //Ow my, the case is invalid!                         
     $totalCountInvalid += 1;                         
     //Call the markInvalid from functions.php                          
     $totalCountValid += 1;                          
     markInvalid($table, '2', 'ritid', $ritid);                 
   }         
}                  
//Echo the result         
$output .= 'Total cases: '.$totalCount.'<br>Valid: '.$totalCountValid.'<br>Invalid: '.$totalCountInvalid;                  echo $output; 

Upvotes: 0

Views: 1147

Answers (1)

verisimilitude
verisimilitude

Reputation: 5108

Your basic problem is that you are doing the following.

1) Getting all cases that haven't been marked as invalid.
2) Looping through the cases obtained in step 1).

What you can easily do is to combine the queries written for 1) and 2) in a single query and loop over the data. This will speed up the things a bit.

Also bear in mind the following tips.

1) Selecting all columns is not at all a good thing to do. It takes ample amount of time for the data to traverse over the network. I would recommend replacing the wild-card with all columns that you really need.

SELECT * <ALL_COlumns>

2) Use indexes - sparingly, efficiently and appropriately. Understand when to use them and when not to.

3) Use views if you can.
4) Enable MySQL slow query log to understand which queries you need to work on and optimize.

log_slow_queries  = /var/log/mysql/mysql-slow.log
long_query_time  = 1
log-queries-not-using-indexes 

5) Use correct MySQL field types and the storage engine (Very very important)
6) Use EXPLAIN to analyze your query - EXPLAIN is a useful command in MySQL which can provide you some great details about how a query is ran, what index is used, how many rows it needs to check through and if it needs to do file sorts, temporary tables and other nasty things you want to avoid.

Good luck.

Upvotes: 2

Related Questions