Svish
Svish

Reputation: 158051

How to list files with special (norwegian) characters

I'm doing a simple (I thought) directory listing of files, like so:

$files = scandir(DOCROOT.'files');

foreach($files as $file)
{
    echo '  <li>'.$file.PHP_EOL;
}

Problem is the files contains norwegian characters (æ,ø,å) and they for some reason come out as question marks. Why is this?

I can apparently fix(?) it by doing this before I echo it out:

$file = mb_convert_encoding($file, 'UTF-8', 'pass');

But it makes little sense to me why this helps, since pass should mean no character encoding conversion is performed, according to the docs... *confused*


Here is an example: http://random.geekality.net/files/index.php

Upvotes: 4

Views: 733

Answers (1)

deceze
deceze

Reputation: 522085

It appears the encoding of the file names is in ISO Latin 1, but the page is interpreted by default using UTF-8. The characters do not come out as "question marks", but as Unicode replacement characters (�). That means the browser, which tries to interpret the byte stream as UTF-8, has encountered a byte invalid in UTF-8 and inserts the character at that point instead. Switch your browser to ISO Latin 1 and see the difference (View > Encoding > ...).

So what you need to do is to convert the strings from ISO Latin 1 to UTF-8, if you designate your page to be UTF-8 encoded. Use mb_convert_encoding($file, 'UTF-8', 'ISO-8859-1') to do so.

Why it works if you specify the $from encoding as pass I can only guess. What you're telling mb_convert_encoding with that is to convert from pass to UTF-8. I guess that makes mb_convert_encoding take the mb_internal_encoding value as the $from encoding, which happens to be ISO Latin 1. I suppose it's equivalent to 'auto' when used as the $from parameter.

Upvotes: 1

Related Questions