Reputation: 225
I need to detect files which contain my string. Files sizes can be bigger than 4gb. I cannot do that simply using tools like file_get_contents()
because it try to put file in RAM.
How can I do this? Using standard PHP? Using elasticsearch or other external search engine?
Upvotes: 4
Views: 4732
Reputation: 2307
You may use something like this. This is not optimized or tested at all, and may have some unnoticed bug by me, but you should get the idea:
function findInFile($file_name, $search_string, $chunk_size=1024) {
// Because we are going to look back one chunk at a time,
// having $search_string more than twice of chunks will yield
// no result.
if (strlen($search_string) > 2 * $chunk_size) {
throw new \RuntimeException('Size of search string should not exceed size of chunk');
}
$file = new \SplFileObject($file_name, 'r');
$last_buffer = '';
while (!$file->eof()) {
$chunk = $file->fread($chunk_size);
$buffer = $last_buffer . $chunk;
$position_in_buffer = strstr($buffer, $search_string);
if ($position_in_buffer !== false) {
// Return position of string in file
return
$file->ftell() - strlen($chunk) + $position_in_buffer
;
}
$last_buffer = $chunk;
}
return null;
}
Upvotes: 5
Reputation: 2167
file_get_contents
return contents of whole file as variable. In your case it means it will try to create 4GB variable which exhausts allowed memory.
Try using fopen and fgets. This will allow you to process file in smaller chunks.
Give it a try! :)
Upvotes: 4
Reputation: 475
If you have a linux based machine, you can use grep command:
shell_exec( 'grep "text string to search" /path/to/file');
As output you will have all the rows containing your text.
here you can find an easy tutorial for using grep!
If you need to find all files containing some text in a directory, you can use
shell_exec( 'grep -rl "text string to search" /path/to/dir' );
r stands for "recursive", so it will look in every file
l stands for "show filenames"
As a result, you will have all filenames (one per row).
Upvotes: 6