user2395126
user2395126

Reputation: 546

PHP feof() returning true before the end of file

I have been working on a strange PHP problem the last few days where the feof() function is returning true before the end of a file. Below is a skeleton of my code:

$this->fh = fopen("bigfile.txt", "r");    

while(!feof($this->fh))
{
    $dataString = fgets($this->fh);

    if($dataString === false && !feof($this->fh))
    {
        echo "Error reading file besides EOF";
    }
    elseif($dataString === false && feof($this->fh))
    {
        echo "We are at the end of the file.\n";

        //check status of the stream
        $meta = stream_get_meta_data($this->fh);
        var_dump($meta);
    }
    else
    {
        //else all is good, process line read in 
    }
}

Through lots of testing I have found that the program works fine on everything except one file:

The output from the var_dump($meta) is as follows:

 array(9) {
  ["wrapper_type"]=>
  string(9) "plainfile"
  ["stream_type"]=>
  string(5) "STDIO"
  ["mode"]=>
  string(1) "r"
  ["unread_bytes"]=>
  int(0)
  ["seekable"]=>
  bool(true)
  ["uri"]=>
  string(65) "full path of file being read"
  ["timed_out"]=>
  bool(false)
  ["blocked"]=>
  bool(true)
  ["eof"]=>
  bool(true)
}

In attempting to find out what is causing feof to return true before the end of the file I have to guess that either:

A) Something is causing the fopen stream to fail and then nothing is able to be read in (causing feof to return true)

B) There is some buffer somewhere that is filling up and causing havoc

C) The PHP gods are angry

I have searched far and wide to see if anyone else was having this issue and cannot find any instances except in C++ where the file was being read in via text mode instead of binary mode and was causing the issue.

UPDATE: I had my script constantly output the number of times the read function had iterated and the unique ID of the user associated with the entry it found beside it. The script is still failing after line 7172713 out of 7175502, but the unique ID of the last user in the file is showing up on line 7172713. It seems that the problem is for some reason lines are being skipped and are not read. All line breaks are present.

Upvotes: 3

Views: 7231

Answers (3)

AlexeyP0708
AlexeyP0708

Reputation: 432

Much time has passed, but it will be useful for others.

Regarding the 1st question, I dare to assume that your file share is split into 2 partitions, since 8M line X ~ 200-500 bytes per line = ~ 1600-4000Mb. Your memory is 2048MB. Computed interrupt between 6M-8M lines or ~ 7M.

About blank lines.

    $str ='hello/r/n';
    echo $str.false; // equivalent to $str. '';

Perhaps fgets returned "false" and the result was appended as a newline. This may explain why the empty line appears.

Another reason

test.txt

1
2
3
4
5

In the examples, I will indicate the iterations statically, by directly specifying the code, for clarity

    <?php
        $res=fopen(__DIR__."/test.txt", "r");
        var_dump('1=>',fread($res,2),feof($res)); //we read 2 bytes each since there is a line feed byte
        var_dump('2=>',fread($res,2),feof($res));
        var_dump('3=>',fread($res,2),feof($res));
        var_dump('4=>',fread($res,2),feof($res));
        var_dump('5=>',fread($res,1),feof($res)); //We read one byte since there is no line feed
        var_dump('6=>',fread($res),feof($res));

Result

string(3) "1=>"
string(2) "1
"
bool(false)
string(3) "2=>"
string(2) "2
"
bool(false)
string(3) "3=>"
string(2) "3
"
bool(false)
string(3) "4=>"
string(2) "4
"
bool(false)
string(3) "5=>"
string(1) "5"
bool(false)
string(3) "6=>"
string(0) ""
bool(true)

We see that the 5th line was read, but on it feof($res) ===false; . So there will be one more iteration . And in the next iteration (line 6) will return an empty string and feof will return true.

    <?php
       $filesize=filesize(__DIR__."/test.txt");
       $res=fopen(__DIR__."/test.txt", "r");
       Echo "----\n";
           var_dump(fread($res,$filesize),feof($res))
           var_dump('fread($res,$filesize),feof($res));
           Echo "----\n";
---
string(9) "1
2
3
4
5"
bool(false)
---
string(0) ""
bool(true)

The examples show that there is one extra iteration, because at the moment when all the bytes of the file are readed, feof does not determine the end of the file.

How can you fix such a moment.

    <?php
       $filesize=filesize(__DIR__."/test.txt")+1;
       $res=fopen(__DIR__."/test.txt", "r");
       var_dump('0=>',fread($res,$filesize),feof($res));

You noticed? I added one to the file size value.

For myself, I call EOF "conditional end file byte".

By itself, 'feof' does not compute anything. This is because feof depends on static metadata and readers (be it fread or fgetc or fgets and others). The reader evaluates whether there is an end of data at the specified length. If so, the eof flag will be set to true. If during $length the reader has not met the end of the data, then eof = false. This behavior is necessary because data can be added dynamically by other processes ($ mode = 'a +') and feof cannot do robust end-of-file calculations with a dynamic method. The reader alone has the right to determine if he has reached the end of the file.

Calculating the length of the last data block for fread

briefly

    <?php
        $filesize=filesize(__DIR__."/test.txt");
        $down_size=0;
        $length=8192;
        $data=[];
        $res=fopen(__DIR__."/test.txt", "r");
        $buf='';
        while(!feof($res)){
            if(($down_size+$length)===$filesize){$length++;}
            $buf=fread($res,$length);
            $down_size+=strlen($buf);
        }

Upvotes: 0

user2395126
user2395126

Reputation: 546

fgets() is seemingly randomly reading in some lines that do have content as empty. The script actually makes it to the end of the file even though my test that showed the line numbers being read was behind due to the way I did the error checking (and the way the error checking was written in the 3rd party code). Now the real question is what is causing fgets() and fread() to think that a line is empty even though it is not. I will ask that as a separate question as that is a change in topic. Thank you all for your help!

Also, just so no one is left hanging, the reason the 3rd party code did not work is because it relied on a line at least having a line break where the current problem with fgets and fread returning an empty string does not give the script what it needs to know the line ever existed, thus it continues trying to execute past the end of the file. Below is the slightly modified 3rd party script which I still consider excellent based on it's execution speed.

The original script can be found in the comments here: http://php.net/manual/en/function.fgets.php and I take absolutely no credit for it.

<?php

//File to be opened
$file = "/path/to/file.ext";
//Open file (DON'T USE a+ pointer will be wrong!)
$fp = fopen($file, 'r');
//Read 16meg chunks
$read = 16777216;
//\n Marker
$part = 0;

while(!feof($fp))
{
    $rbuf = fread($fp, $read);
    for($i=$read;$i > 0 || $n == chr(10);$i--)
    {
        $n=substr($rbuf, $i, 1);
        if($n == chr(10))break;
        //If we are at the end of the file, just grab the rest and stop loop
        elseif(feof($fp))
        {
            $i = $read;
            $buf = substr($rbuf, 0, $i+1);
            echo "<EOF>\n";
            break;
        }
    }
    //This is the buffer we want to do stuff with, maybe thow to a function?
    $buf = substr($rbuf, 0, $i+1);

    //output the chunk we just read and mark where it stopped with <break>
    echo $buf . "\n<break>\n";

    //Point marker back to last \n point
    $part = ftell($fp)-($read-($i+1));
    fseek($fp, $part);
}
fclose($fp);

?>

UPDATE: After hours more searching, analyzing, hair pulling, etc. it seems that the culprit was an uncaught bad character - in this case a 1/2 character hex value BD. While generating the file that I was reading from the script used stream_get_line() to read the line in from it's original source. It was then supposed to remove all bad characters (it appears that my regex was not up to par) and then use str_getcsv() to convert the content to an array, do some processing, then write to a new file (the one I was trying to read). Somewhere in this process, probably str_getcsv(), the 1/2 character caused the whole thing to just insert a blank line instead of the data. Several thousand of these were placed all throughout the file (wherever the 1/2 symbol appeared). This made the file appear to be the correct length, but for the EOF to be reached too quickly when counting input based on a known number of lines. I want to thank everyone who helped me with this problem and I am very sorry that the real cause had nothing to do with my question. However if it hadn't been for everyone's suggestions and questions I would not have looked in the right places.

Lesson learned from this experience - when EOF is reached too quickly the best place to look is for instances of double line breaks. When writing a script that reads from a formatted file a good practice is to check for these. Below is my original code modified to do just that:

$this->fh = fopen("bigfile.txt", "r");    

while(!feof($this->fh))
{
    $dataString = fgets($this->fh);

    if($dataString == "\n" || $dataString == "\r\n" || $dataString == "")
    {
        throw new Exception("Empty line found.");
    }

    if($dataString === false && !feof($this->fh))
    {
        echo "Error reading file besides EOF";
    }
    elseif($dataString === false && feof($this->fh))
    {
        echo "We are at the end of the file.\n";

        //check status of the stream
        $meta = stream_get_meta_data($this->fh);
        var_dump($meta);
    }
    else
    {
        //else all is good, process line read in 
    }
}

Upvotes: 2

user1000456
user1000456

Reputation:

you must split your file or increase the timeout in php by:

upload_max_filesize = 2M 
;or whatever size you want

max_execution_time = 60 ; also, higher if you must

because: Returns TRUE if the file pointer is at EOF or an error occurs (including socket timeout); otherwise returns FALSE. see:http://php.net/manual/en/function.feof.php

Upvotes: 4

Related Questions