Zsub
Zsub

Reputation: 1797

Make PHP pathinfo() return the correct filename if the filename is UTF-8

When using PHP's pathinfo() function on a filename known to be UTF-8, it does not return the correct value, unless there are 'normal' characters in front of the special character.

Examples:
pathinfo('aä.pdf')returns:

Array
(
[dirname] => [the dir]
[basename] => aä.pdf
[extension] => pdf
[filename] => aä
)  

which is fine and dandy, but pathinfo('äa.pdf')returns:

Array
(
[dirname] => [the dir]
[basename] => a.pdf
[extension] => pdf
[filename] => a
)  

Which is not quite what I was expecting. Even worse, pathinfo('ä.pdf')returns:

Array
(
[dirname] => [the dir]
[basename] => .pdf
[extension] => pdf
[filename] => 
)  

Why does it do this? This goes for all accented characters I have tested.

Upvotes: 37

Views: 13822

Answers (7)

LF-DevJourney
LF-DevJourney

Reputation: 28529

As the doc shows,

Caution

pathinfo() is locale aware, so for it to parse a path containing multibyte characters correctly, the matching locale must be set using the setlocale() function.

and the example in the manual

Upvotes: 0

Carles Figuera
Carles Figuera

Reputation: 91

private function _pathinfo($path, $options = null) {
  $result = pathinfo(' ' . $path, $options);
  return substr($result, 1);
}

Upvotes: 0

sgv_test
sgv_test

Reputation: 1285

before usage pathinfo

setlocale(LC_ALL,'en_US.UTF-8');
pathinfo($OriginalName, PATHINFO_FILENAME);
pathinfo($OriginalName, PATHINFO_BASENAME);

Upvotes: 18

user1836049
user1836049

Reputation:

When process ansi characters, the function pathinfo do correctly.

Base this note, we will convert (encoding) input to ansi charaters and then still use function pathinfo to keep its whole things.

Finally, we will convert (decoding) output values to original format.

And demo as bellowing.

function _pathinfo($path, $options = null)
{
    $path = urlencode($path);
    $parts = null === $options ? pathinfo($path) : pathinfo($path, $options);
    foreach ($parts as $field => $value) {
        $parts[$field] = urldecode($value);
    }
    return $parts;
}
// calling
_pathinfo('すtest.jpg');
_pathinfo('すtest.jpg', PATHINFO_EXTENSION);

Upvotes: 1

Timo Kähkönen
Timo Kähkönen

Reputation: 12210

I have used these functions in PHP 5.3.3 - 5.3.18 to handle UTF-8 issue in basename() and pathinfo().


if (!function_exists("mb_basename"))
{
  function mb_basename($path)
  {
    $separator = " qq ";
    $path = preg_replace("/[^ ]/u", $separator."\$0".$separator, $path);
    $base = basename($path);
    $base = str_replace($separator, "", $base);
    return $base;
  }
}
if (!function_exists("mb_pathinfo"))
{
  function mb_pathinfo($path, $opt = "")
  {
    $separator = " qq ";
    $path = preg_replace("/[^ ]/u", $separator."\$0".$separator, $path);
    if ($opt == "") $pathinfo = pathinfo($path);
    else $pathinfo = pathinfo($path, $opt);

    if (is_array($pathinfo))
    {
      $pathinfo2 = $pathinfo;
      foreach($pathinfo2 as $key => $val)
      {
        $pathinfo[$key] = str_replace($separator, "", $val);
      }
    }
    else if (is_string($pathinfo)) $pathinfo = str_replace($separator, "", $pathinfo);
    return $pathinfo;
  }
}

Upvotes: 10

Zsub
Zsub

Reputation: 1797

A temporary work-around for this problem appears to be to make sure there is a 'normal' character in front of the accented characters, like so:

function getFilename($path)
{
    // if there's no '/', we're probably dealing with just a filename
    // so just put an 'a' in front of it
    if (strpos($path, '/') === false)
    {
        $path_parts = pathinfo('a'.$path);
    }
    else
    {
        $path= str_replace('/', '/a', $path);
        $path_parts = pathinfo($path);
    }
    return substr($path_parts["filename"],1);
}

Note that we replace all occurrences of '/' with '/a' but this is okay, since we return starting at offset 1 of the result. Interestingly enough, the dirname part of pathinfo() does seem to work, so no workaround is needed there.

Upvotes: 9

Related Questions