Reputation: 431
I am new to PHP coding and here am looking for fastest way to do recursive search on all directories for an array of strings.
I am doing this way
$contents_list = array("xyz","abc","hello"); // this list can grow any size
$path = "/tmp/"; //user will give any path which can contain multi level sub directories
$dir = new RecursiveDirectoryIterator($path);
foreach(new RecursiveIteratorIterator($dir) as $filename => $file) {
$fd = fopen($file,'r');
if($fd) {
while(!feof($fd)) {
$line = fgets($fd);
foreach($contents_list as $content) {
if(strpos($line, $content) != false) {
echo $line."\n";
}
}
}
}
fclose($fd);
}
Here I am recursively iterating over all directories and then again on each file iterate over contents array to search.
Is there any better way to do kind of search ? Please suggest for faster alternative.
Thanks
Upvotes: 14
Views: 6970
Reputation: 121
Even 2013 there was a - in my eyes much more readable - PHP native way to iterate recursively over a directory tree: the RecursiveDirectoryIterator class.
Have a look at this sample:
<?php
// Initialize Recursive Iterator
$directory = new RecursiveDirectoryIterator( 'path/to/project/' );
$iterator = new RecursiveIteratorIterator( $directory );
$regex = new RegexIterator( $iterator, '/^.+\.php$/i', RecursiveRegexIterator::GET_MATCH );
// Iterate over files
$files = array();
foreach ( $regex as $info ) {
// Do something with file to be found at $info->getPathname()
}
?>
Best regards from Salzburg!
Upvotes: 0
Reputation: 6763
If you're allowed to execute shell commands in your environment (and assuming you're running your script on *nix), you could call the native grep command recursively. That would give you the fastest results.
$contents_list = array("xyz","abc","hello");
$path = "/tmp/";
$pattern = implode('\|', $contents_list) ;
$command = "grep -r '$pattern' $path";
$output = array();
exec($command, $output);
foreach ($output as $match) {
echo $match . '\n';
}
If the disable_functions
directive is in effect and you can't call grep, you could use your approach with RecursiveDirectoryIterator
and reading the files line by line, using strpos on each line. Please note that strpos
requires a strict equality check (use !== false
instead of != false
), otherwise you'll skip matches at the beginning of a line.
A slightly faster way is to use glob recusively to obtain a list of files, and read those files at once instead of scanning them line by line. According to my tests, this approach will give you about 30-35% time advantage over yours.
function recursiveDirList($dir, $prefix = '') {
$dir = rtrim($dir, '/');
$result = array();
foreach (glob("$dir/*", GLOB_MARK) as &$f) {
if (substr($f, -1) === '/') {
$result = array_merge($result, recursiveDirList($f, $prefix . basename($f) . '/'));
} else {
$result[] = $prefix . basename($f);
}
}
return $result;
}
$files = recursiveDirList($path);
foreach ($files as $filename) {
$file_content = file($path . '/' . $filename);
foreach ($file_content as $line) {
foreach($contents_list as $content) {
if(strpos($line, $content) !== false) {
echo $line . '\n';
}
}
}
}
Credit for the recursive glob function goes to http://proger.i-forge.net/3_ways_to_recursively_list_all_files_in_a_directory/Opc
To sum it up, performance-wise you have the following rankings (results in seconds for a farly large directory containing ~1200 files recusively, using two common text patterns):
glob
and read files with file()
- 9.4443sRecursiveDirectoryIterator
and read files with readline()
- 15.1183sUpvotes: 16