Al.
Al.

Reputation: 2882

Efficient flat file searching in PHP

I'd like to store 0 to ~5000 IP addresses in a plain text file, with an unrelated header at the top. Something like this:

Unrelated data
Unrelated data
----SEPARATOR----
1.2.3.4
5.6.7.8
9.1.2.3

Now I'd like to find if '5.6.7.8' is in that text file using PHP. I've only ever loaded an entire file and processed it in memory, but I wondered if there was a more efficient way of searching a text file in PHP. I only need a true/false if it's there.

Could anyone shed any light? Or would I be stuck with loading in the whole file first?

Thanks in advance!

Upvotes: 3

Views: 3017

Answers (7)

Tanner Ottinger
Tanner Ottinger

Reputation: 3060

are you trying to compare the current IP with the text files listed IP's? the unrelated data wouldnt match anyway. so just use strpos on the on the full file contents (file_get_contents).

<?php
    $file = file_get_contents('data.txt');
    $pos = strpos($file, $_SERVER['REMOTE_ADDR']);
    if($pos === false) {
        echo "no match for $_SERVER[REMOTE_ADDR]";
    }
    else {
        echo "match for $_SERVER[REMOTE_ADDR]!";
    }
?>

Upvotes: 0

shedd
shedd

Reputation: 4228

I haven't tested this personally, but there is a snippet of code in the PHP manual that is written for large file parsing:

http://www.php.net/manual/en/function.fgets.php#59393

//File to be opened
$file = "huge.file";
//Open file (DON'T USE a+ pointer will be wrong!)
$fp = fopen($file, 'r');
//Read 16meg chunks
$read = 16777216;
//\n Marker
$part = 0;

while(!feof($fp)) {
    $rbuf = fread($fp, $read);
    for($i=$read;$i > 0 || $n == chr(10);$i--) {
        $n=substr($rbuf, $i, 1);
        if($n == chr(10))break;
        //If we are at the end of the file, just grab the rest and stop loop
        elseif(feof($fp)) {
            $i = $read;
            $buf = substr($rbuf, 0, $i+1);
            break;
        }
    }
    //This is the buffer we want to do stuff with, maybe thow to a function?
    $buf = substr($rbuf, 0, $i+1);
    //Point marker back to last \n point
    $part = ftell($fp)-($read-($i+1));
    fseek($fp, $part);
}
fclose($fp);

The snippet was written by the original author: hackajar yahoo com

Upvotes: 0

Phill Pafford
Phill Pafford

Reputation: 85378

You could use the GREP command with backticks in your on a Linux server. Something like:

$searchFor = '5.6.7.8';
$file      = '/path/to/file.txt';

$grepCmd   = `grep $searchFor $file`;
echo $grepCmd;

Upvotes: 0

Bart
Bart

Reputation: 6814

You might try fgets()

It reads a file line by line. I'm not sure how much more efficient this is though. I'm guessing that if the IP was towards the top of the file it would be more efficient and if the IP was towards the bottom it would be less efficient than just reading in the whole file.

Upvotes: 0

Cody Caughlan
Cody Caughlan

Reputation: 32748

You could shell out and grep for it.

Upvotes: 0

cletus
cletus

Reputation: 625465

5000 isn't a lot of records. You could easily do this:

$addresses = explode("\n", file_get_contents('filename.txt'));

and search it manually and it'll be quick.

If you were storing a lot more I would suggest storing them in a database, which is designed for that kind of thing. But for 5000 I think the full load plus brute force search is fine.

Don't optimize a problem until you have a problem. There's no point needlessly overcomplicating your solution.

Upvotes: 5

localshred
localshred

Reputation: 2233

I'm not sure if perl's command line tool needs to load the whole file to handle it, but you could do something similar to this:

<?php
...
$result = system("perl -p -i -e '5\.6\.7\.8' yourfile.txt");
if ($result)
    ....
else
    ....
...
?>

Another option would be to store the IP's in separate files based on the first or second group:

# 1.2.txt
1.2.3.4
1.2.3.5
1.2.3.6
...

# 5.6.txt
5.6.7.8
5.6.7.9
5.6.7.10
...

... etc.

That way you wouldn't necessarily have to worry about the files being so large you incur a performance penalty by loading the whole file into memory.

Upvotes: 1

Related Questions