user1107685
user1107685

Reputation: 451

PHP - preg_match_all not searching the full string?

I'm using preg_match_all to search through a file that I'm reading in. The file contains many lines of the following format and I'm extracting the numbers between the tags;

<float_array id="asdfasd_positions-array" count="6">1 2 3 4 5 6</float_array>

I'm using preg_match_all and it is working well - except it gets so far through the file then seems to stop.

preg_match_all("/\<float_array id\=\".+?positions.+?\" count\=\".+?\"\>(.+?)\<\/float_array\>/",$file, $results);

The file is 90,000 rows and about 8MB in size. I'm editing every third number in the extracted string and using str_replace to edit it back in to the file. The file is then written again. See the full script here;

http://pastie.org/4300537

The script is sucessfully replacing about half the entries and not doing anything with the second half of the file. I even copied a sucessfully edited line from higher in the file and pasted further down... and it wasn't edited further in the file. It's as if the array if full but memory_limit is set to 500M.

Any ideas?

EDIT: Solution Found

I found the problem - the size of the strings between the tags were too large in some instances and were skipped. I found the limit in PHP. pcre.backtrack_limit is set at 100000 and some strings were larger than this. So I increased this in the .htaccess file using the following line and it now works.

php_value pcre.backtrack_limit 5000000

Upvotes: 4

Views: 1214

Answers (2)

Ωmega
Ωmega

Reputation: 43683

You might consider to parse your text file with simple parser like this >>

$fi = fopen("data.txt",  "r");
$fo = fopen('data2.txt', 'w');
$status = 0;
do {
  $data = stream_get_line($fi, PHP_INT_MAX, ">");
  if ($status == 1) {
    preg_match("/(.*)<\/float_array$/", $data, $m);
    $status--;
    if (sizeof($m) != 0) {
      fwrite($fo, $m[1] . "\n");
      continue;
    }
  }
  if ($status == 0) {
    preg_match("/<float_array[^>]*?\bid\s*=\s*[\"'][^\"']*?positions[^\"']*?[\"'][^>]*?\bcount\s*\=[^>]*?$/", $data, $m);
    if (sizeof($m) > 0) {
      $status++;
    }
  }
} while (!feof($fi));
fclose($fi);
fclose($fo);

Upvotes: 0

Ωmega
Ωmega

Reputation: 43683

If memory is an issue and not execution time limit, then go wth slow solution (line by line) >>

$fi = fopen("data.txt",  "r");
$fo = fopen('data2.txt', 'w');
while (!feof($fi)) {
  $line = fgets($fi);

  # regex stuff here

  fwrite($fo, $line);
}
fclose($fi);
fclose($fo);

Upvotes: 2

Related Questions