Reputation: 21
I have a large database that contains results of an experiment for 1500 individuals. Each individual has 96 data points. I wrote the following script to summarize and then format the data so it can be used by the analysis software. At first all was good until I had more than 500 individuals. Now I am running out of memory.
I was wondering if anyone has a suggestion on now to overcome the memory limit problem without sacrificing speed.
This is how the table look in the database
fishId assayId allele1 allele2
14_1_1 1 A T
14_1_1 2 A A
$mysql = new PDO('mysql:host=localhost; dbname=aquatech_DB', $db_user, $db_pass);
$query = $mysql->prepare("SELECT genotyped.fishid, genotyped.assayid, genotyped.allele1, genotyped.allele2, fishId.sex, " .
"fishId.role FROM `fishId` INNER JOIN genotyped ON genotyped.fishid=fishId.catId WHERE fishId.projectid=:project");
$query->bindParam(':project', $project, PDO::PARAM_INT);
$query->execute();
So this is the call to the database. It is joining information from two tables to build the file I need.
if(!$query){
$error = $query->errorInfo();
print_r($error);
} else {
$data = array();
$rows = array();
if($results = $query->fetchAll()){
foreach($results as $row)
{
$rows[] = $row[0];
$role[$row[0]] = $row[5];
$data[$row[0]][$row[1]]['alelleY'] = $row[2];
$data[$row[0]][$row[1]]['alelleX'] = $row[3];
}
$rows = array_unique($rows);
foreach($rows as $ids)
{
$col2 = $role[$ids];
$alelleX = $alelleY = $content = "";
foreach($snp as $loci)
{
$alelleY = convertAllele($data[$ids][$loci]['alelleY']);
$alelleX = convertAllele($data[$ids][$loci]['alelleX']);
$content .= "$alelleY\t$alelleX\t";
}
$body .= "$ids\t$col2\t" . substr($content, 0, -1) . "\n";
This parses the data. In the file I need I have to have one row per individual rather than 96 rows per individual, that is why the data has to be formatted. In the end of the script I just write $body to a file.
I need the output file to be
FishId Assay 1 Assay 2
14_1_1 A T A A
$location = "results/" . "$filename" . "_result.txt";
$fh = fopen("$location", 'w') or die ("Could not create destination file");
if(fwrite($fh, $body))
Upvotes: 2
Views: 5427
Reputation: 584
fetchAll()
fetches the entire result in one go, which has its uses but is greedy with memory. Why not just use fetch()
which handles one row at a time?
You seem to indexing the rows by the first column, creating another large array, and then removing duplicate items. Why not use SELECT DISTINCT
in the query to remove duplicates before they get to PHP?
I'm not sure what the impact would be on speed - fetch()
may be slower than fetchAll()
- but you don't have to remove duplicates from the array which saves some processing.
I'm also not sure what your second foreach
is doing but you should be able to do it all in a single pass. I.e. a foreach
loop within a fetch loop.
Other observations on your code above:
$role
array seems to do the same indexing job as $rows
- using $row[0]
as the key effectively removes the duplicates in a single pass. Removing the duplicates by SELECT DISTINCT
is probably better but, if not, do you need the $rows
array and the array_unique function at all?$row[0]
can have different values of $row[5]
then your indexing method will be discarding data - but you know what's in your data so I guess you've already thought of that (the same could be true of the $data
array)Upvotes: 2
Reputation: 356
Instead of reading the whole result from your database query into a variable with fetchAll(), fetch it row by row:
while($row = $query->fetch()) { ... }
Upvotes: 4