Reputation: 21522
I'm using this bit of code and things like it quite a lot to parse a file...
if ($dataLines[0] == "0 HEAD" &&
($dataLines[count($dataLines) - 1] == "0 TRLR" ||
$dataLines[count($dataLines) - 2] == "0 TRLR")) {
// More Code Here
}
I've added the following else for debugging...
} else {
$this->error("import(): File is not a gedcom datafile: " . $filename);
$this->debug("import(): Lines: " . count($dataLines));
$this->debug("import(): Lines: dataLines[0] = [" . $dataLines[0] ."]");
$this->debug("import(): Lines: dataLines[count($dataLines) - 1] = [" . $dataLines[count($dataLines) - 1] ."]");
}
When I parse a ANSII file things work. I've been given a file in UTF-8 and things break. My output is:
Starting gedcom read
import(): File is not a gedcom datafile: /Users/jzaun/Development/www/assets/trees/greek/tree.ged
import(): Lines: 10712
import(): Lines: dataLines[0] = [0 HEAD ]
and I get an error too:
PHP Fatal error: Uncaught exception 'ErrorException' with message 'Array to string conversion' in /Users/jzaun/Development/www/classes/App/Gedcom.php:478 Stack trace:
To load in the file I'm using:
function file_get_contents_utf8($fn) {
$content = file_get_contents($fn);
return mb_convert_encoding($content, 'UTF-8', mb_detect_encoding($content));
}
$data = $this->file_get_contents_utf8($filename);
$dataLines = explode("\n", trim($data));
if (count($dataLines) == 1) {
$dataLines = explode("\r", trim($data));
}
I'm guessing I am either loading the file wrong or I shouldn't be doing things like $dataLines[0] == "0 HEAD"
. How should I go about parsing the file so it works with UTF-8?
Upvotes: 0
Views: 126
Reputation: 449613
This

is the Byte Order Mark (BOM). It's likely your problem, as it's changing the first line and your comparison fails.
You will have to ignore/remove the first three bytes if they equal 
. See this answer for one example.
Upvotes: 1