Reputation: 1257
What I'm trying to do is to convert some archived CSV data. It all worked well on a couple thousand files. I parse out a date and convert it to a timestamp. However on one file, somehow it doesn't work. I use (int) $string
to cast the parsed strings to int values -> it returns int(0)
. I also used intval()
-> same result. When I use var_dump($string)
, I get some weird output, for example string(9) "2008"
, which actually should be string(4) "2008"
. I tried to get to use preg_match
on the string, without success. Is this an encoding problem?
Here is some code, it's just pretty standard stuff:
date_default_timezone_set('UTC');
$ms = 0;
function convert_csv($filename)
{
$target = "tmp.csv";
$fp = fopen("$filename","r") or die("Can't read the file!");
$fpo = fopen("$target","w") or die("Can't read the file!");
while($line = fgets($fp,1024))
{
$linearr = explode(",","$line");
$time = $linearr[2];
$bid = $linearr[3];
$ask = $linearr[4];
$time = explode(" ",$time);
$date = explode("-",$time[0]);
$year = (int) $date[0]);
$month = (int)$date[1];
$day = (int)$date[2];
$time = explode(":",$time[1]);
$hour = (int)$time[0];
$minute = (int)$time[1];
$second = (int)$time[2];
$time = mktime($hour,$minute,$second,$month,$day,$year);
if($ms >= 9)
{
$ms = 0;
}else
{
$ms ++;
}
$time = $time.'00'.$ms;
$newline = "$time,$ask,$bid,0,0\n";
fwrite($fpo,$newline);
}
fclose($fp);
fclose($fpo);
unlink($filename);
rename($target,$filename);
}
Here is a link to the file we are talking about:
Upvotes: 0
Views: 134
Reputation: 4534
You may try to convert your file to plan ascii using iconv.
If you are on a linux or similar system that has iconv command:
$ iconv -f UTF16 -t ASCII EUR_USD_Week1.csv > clean.csv
Otherwise you may found the PHP iconv function useful:
http://php.net/manual/en/function.iconv.php
Upvotes: 0
Reputation: 3151
The file seems to be encoded in UTF-16, so it is indeed an encoding problem. The string(9)
is caused by the null-bytes that you get if UTF-16 is interpreted as a single-byte encoding.
This makes the file hard to read with functions like fgets
, since they are binary-safe and thus not encoding aware. You could read the entire file in memory and perform an encoding conversion, but this is horribly inefficient.
I'm not sure if it's possible to read the file properly as UTF-16 using native PHP functions. You might need to write or use an external library.
Upvotes: 2