Reputation: 801
I have s PHP script that loads some data from file1 1M lines , there is also another much bigger file around 30M lines than needs to be looked up against data from first file. So I load 1M lines into array , ie $array[$STRINGLOOKUP] = 1; iterate over 30M lines and do lookup on array_key_exist on $array.
Problem is that on my laptop 32bit PHP (2GB limit) all is ok, but on production 64bit PHP, there is out of memory problem (2GB limit also). I heard that using a pack() function you can lower consumption of memory. Did anyone tried it and is it possible/worth trying?
<?php
$index=array();
foreach($lines as $line){
$index[$line]=1
}
foreach($lines30M as $line){
list($junk1,$lookup,$junk2) = explode("\t",$line,3);
if(array_key_exist($index[$lookup]){
//do something
}
}
?>
Upvotes: 1
Views: 58
Reputation: 801
I have tried to use pack() just as an exercise but it takes more memory than normal array but there is a SplFixedArray class that uses less memory. Though this is not using less memory for Integers but uses fixed array length that consumes less memory than normal arrays.
Here is a sample code of memory usage
<?php
$mem = memory_get_usage(1);
$array = array();
for($i=0;$i<100000;$i++){
$array[$i]=1;
}
$mem1 = memory_get_usage();
echo ($mem1 - $mem)/1024/1024 . " Mb\n";
// 13.8 Mb
$array2 = array();
for($i=0;$i<100000;$i++){
$array2[$i]=pack('v',1);
}
$mem2 = memory_get_usage();
echo ($mem2 - $mem1)/1024/1024 . " Mb\n";
// 17.0 Mb
$array3 = new SplFixedArray(100000);
for($i=0;$i<100000;$i++){
$array3[$i]=1;
}
$mem3 = memory_get_usage();
echo ($mem3 - $mem2)/1024/1024 . " Mb\n";
// 5.3 Mb
This is huge saving of memory if you know in advance size of array and if you can use integers as keys (SplFixedArray supports only ints as keys)
Upvotes: 0
Reputation: 16688
Putting your 1M in an array might not be the most memory efficient way to do this. The requirements run into the tens of bytes for each array cell. Moreso, in the following bit of code you need this twice:
foreach($lines as $line){
$index[$line]=1
}
because you will end up with two arrays: $lines
and $index
both containing the same information. Why not stick to one and use in_array()
instead of array_key_exists()
?
But I do agree with the other comments, this is clearly a job for a database. Something like SQLite via PDO? You'll have to learn these someday, and once you know how you can do so much more.
Upvotes: 0
Reputation: 48357
Did anyone tried it and is it possible/worth trying?
No.
You are trying to write a DBMS. Do you think you can do a betterjob than the guys who write MySQL, MariaDB, SQLLite, PostgreSQL, MongoDB, GDBM.....?
Upvotes: 2