user871784
user871784

Reputation: 1257

PHP parsing/typecasting problems

What I'm trying to do is to convert some archived CSV data. It all worked well on a couple thousand files. I parse out a date and convert it to a timestamp. However on one file, somehow it doesn't work. I use (int) $string to cast the parsed strings to int values -> it returns int(0). I also used intval() -> same result. When I use var_dump($string), I get some weird output, for example string(9) "2008", which actually should be string(4) "2008". I tried to get to use preg_match on the string, without success. Is this an encoding problem?

Here is some code, it's just pretty standard stuff:

date_default_timezone_set('UTC');
$ms = 0;
function convert_csv($filename)
{
$target = "tmp.csv";
$fp = fopen("$filename","r") or die("Can't read the file!");
$fpo = fopen("$target","w") or die("Can't read the file!");
while($line = fgets($fp,1024))
{
    $linearr = explode(",","$line");

    $time = $linearr[2];
    $bid = $linearr[3];
    $ask = $linearr[4];
    $time = explode(" ",$time);
    $date = explode("-",$time[0]);
    $year = (int) $date[0]);
    $month =  (int)$date[1];
    $day = (int)$date[2];
    $time = explode(":",$time[1]);

    $hour = (int)$time[0];
    $minute = (int)$time[1];
    $second = (int)$time[2];
    $time = mktime($hour,$minute,$second,$month,$day,$year);

    if($ms >= 9)
    {
        $ms = 0;
    }else
    {
        $ms ++;
    }
    $time = $time.'00'.$ms;
    $newline = "$time,$ask,$bid,0,0\n";
    fwrite($fpo,$newline);

}
fclose($fp);
fclose($fpo);
unlink($filename);
rename($target,$filename);

}

Here is a link to the file we are talking about:

Upvotes: 0

Views: 134

Answers (2)

Fran Marzoa
Fran Marzoa

Reputation: 4534

You may try to convert your file to plan ascii using iconv.

If you are on a linux or similar system that has iconv command:

$ iconv -f UTF16 -t ASCII EUR_USD_Week1.csv > clean.csv

Otherwise you may found the PHP iconv function useful:

http://php.net/manual/en/function.iconv.php

Upvotes: 0

Another Code
Another Code

Reputation: 3151

The file seems to be encoded in UTF-16, so it is indeed an encoding problem. The string(9) is caused by the null-bytes that you get if UTF-16 is interpreted as a single-byte encoding.

This makes the file hard to read with functions like fgets, since they are binary-safe and thus not encoding aware. You could read the entire file in memory and perform an encoding conversion, but this is horribly inefficient.

I'm not sure if it's possible to read the file properly as UTF-16 using native PHP functions. You might need to write or use an external library.

Upvotes: 2

Related Questions