Quartermain
Quartermain

Reputation: 173

PHP doesn't recognize filename with accented character "é" in it

Currently I am trying to check with PHP if a file exists. The file I am trying to check if it exists contains the character "é" in its name: 13067-AP-03 A - Situation projetée.pdf.

The code I use to check if the file exists is:

$filename = 'C:/13067-AP-03 A - Situation projetée.pdf';

if (file_exists($filename)) 
{
    echo "The file exists";
} else 
{
    echo "The file does not exist";
}

The problem that I am facing right now is that whenever I try to check if the file exists I get the message it doesn't exist. If I remove the "é" I get the message that the file does exist.

It looks that PHP somehow doesn't recognize the file if it has an accented character in it. I tried the following:

urlencode($filename);
addslashes($filename);
utf8_encode($filename);

None of which worked. I also tried:

setlocale(LC_ALL, "en_US.utf8");

Maybe worth noticing is that when I get the filename straight from PHP I get the following:

13067-AP-03 A - Situation projet�e.pdf

I have to do the following to have the filename displayed correctly:

$filename = iconv( "CP437", 'UTF-8', $filename);

I was wondering if someone had the same problem before and could help me out with this one. All help is greatly appreciated.

For those who are interested, the script runs on a windows machine.

Strangely this worked: I copied all the source code from Sublime Text 3 to notepad. I proceeded to save the source code in notepad by overwriting the PHP file.

Now when I check to see if the file exists it shows the following filename that exists:

13067-AP-03 A - Situation projet�e.pdf

The only problem that I am facing right now is that I want to download the file using file_get_contents. But file_get_contents doesn't interpet the as the correct character.

Upvotes: 5

Views: 1797

Answers (4)

Dula
Dula

Reputation: 1412

I found this function which helped me with a similar problem.

Source:- https://www.php.net/urldecode

Thank you alejandro at devenet dot net.

function to_utf8( $string ) {

// From http://w3.org/International/questions/qa-forms-utf-8.html

if ( preg_match('%^(?:

  [\x09\x0A\x0D\x20-\x7E]            # ASCII

| [\xC2-\xDF][\x80-\xBF]             # non-overlong 2-byte

| \xE0[\xA0-\xBF][\x80-\xBF]         # excluding overlongs

| [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}  # straight 3-byte

| \xED[\x80-\x9F][\x80-\xBF]         # excluding surrogates

| \xF0[\x90-\xBF][\x80-\xBF]{2}      # planes 1-3

| [\xF1-\xF3][\x80-\xBF]{3}          # planes 4-15

| \xF4[\x80-\x8F][\x80-\xBF]{2}      # plane 16

)*$%xs', $string) ) {
    return $string;
} else {
    return iconv( 'CP1252', 'UTF-8', $string);
}
}

Upvotes: 0

Frederick Zhang
Frederick Zhang

Reputation: 3683

I think it's a problem of the PHP under Windows. I downloaded a Windows binary copy to my Windows who's in Japanese and successfully reproduced your problem.

According to https://bugs.php.net/bug.php?id=47096

So, if you have a generic name of a file (along with its path) as a Unicode string $u (for example UTF-8 encoded) and you want to try to save it with that name under Windows, you must first check the current locale calling setlocale(LC_CTYPE, 0) to retrieve the current code page, then you must convert $u to an array of bytes according to the code page; if one or more code points have no counterpart in the current code page, the file cannot be saved with that name from PHP. Dot.

My code page is CP932, which you can see yours by running chcp in cmd.

So the code is expected to be:

$filename='C:\Users\Frederick\Desktop\13067-AP-03 A - Situation projetée.pdf';
$filename=mb_convert_encoding($filename, 'CP932', 'UTF-8');
var_dump($filename);
var_dump(file_exists($filename));

But this won't work! Why? Because CP932 doesn't contain the character of é!

According to https://msdn.microsoft.com/en-us/library/windows/desktop/dd317748%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396

NTFS stores file names in Unicode. In contrast, the older FAT12, FAT16, and FAT32 file systems use the OEM character set.

Windows itself uses UTF-16LE, which is called Unicode by Microsoft, to save its file names. But PHP doesn't support a UTF-16LE encoded file name.

In conclusion, it's a pity that I cannot find a way to solve the problem rather than escaping all those characters when naming the files if you work on Windows. And I also do not think that the team of PHP will solve the problem in the future.

Upvotes: 3

Zebra North
Zebra North

Reputation: 11492

Make sure that your text editor is saving the file as "UTF-8 without BOM"

BOM is the Byte Order Mark, two bytes placed at the start of the file which allow software reading the file to determine if it has been saved as little-endian or big-endian, however the PHP interpreter cannot interpret these characters and so you must save the file without the byte order mark.

Upvotes: -1

João Reis
João Reis

Reputation: 131

Try this on start of your php file:

<?php
header('Content-Type: text/html; charset=utf-8');
?>

Upvotes: -2

Related Questions