Mille
Mille

Reputation: 743

php sprintf() with foreign characters?

Seams to be like sprintf have a problem with foregin characters? Or is it me doing something wrong? Looks like it work when removing chars like åäö from the string though. Should that be necessary?

I want the following lines to be aligned correctly for a report:

2011-11-27   A1823    -Ref. Leif  -           12 873,00    18.98
2011-11-30   A1856    -Rättat xx -            6 594,00    19.18

I'm using sprintf() like this: %-12s %-8s -%-10s -%20s %8.2f

Using: php-5.3.23-nts-Win32-VC9-x86

Upvotes: 28

Views: 14604

Answers (4)

Jehong Ahn
Jehong Ahn

Reputation: 2387

Problem

There is no multibyte format functions.

Idea

You can't convert input strings. You should change format lengths. A format %4s means 4 widths (not characters - see footnote). But PHP format functions count bytes. So you should add format lengths to bytes - widths.

Implementations

from @nimmneun

function mb_sprintf($format, ...$args) {
    $params = $args;
    $callback = function ($length) use (&$params) {
        $value = array_shift($params);
        return $length[0] + strlen($value) - mb_strwidth($value);
    };
    $format = preg_replace_callback('/(?<=%|%-)\d+(?=s)/', $callback, $format);
    return sprintf($format, ...$args);
}

And don't forget another option str_pad($input, $length, $pad_char=' ', STR_PAD_RIGHT)

function mb_str_pad(...$args) {
    $args[1] += strlen($args[0]) - mb_strwidth($args[0]);
    return str_pad(...$args);
}

Footnote

Asian characters have 3 bytes and 2 width and 1 character length. If your format is %4s and the input is one asian character, you should need two spaces (padding) not three.

Upvotes: 0

Martin Prikryl
Martin Prikryl

Reputation: 202088

Strings in PHP are basically arrays of bytes (not characters). They cannot work natively with multibyte encodings (such as UTF-8).

For details see:
https://www.php.net/manual/en/language.types.string.php#language.types.string.details

Most string functions in PHP have multibyte equivalent though (with the mb_ prefix). But the sprintf does not.

There's a user comment (by "viktor at textalk dot com") with multibyte implementation of the sprintf on the function's documentation page at php.net. It may work for you:
https://www.php.net/manual/en/function.sprintf.php#89020

Upvotes: 13

nimmneun
nimmneun

Reputation: 1159

I was actually trying to find out if PHP ^7 finally has a native mb_sprintf() but apparently no xD.

For the sake of completeness, here is a simple solution I've been using in some old projects. It just adds the diff between strlen & mb_strlen to the desired $targetLengh. The non-multibyte example is just added for the sake of easy comparison =).

$text = "Gultigkeitsprufung ist fehlgeschlagen: %{errors}";
$mbText = "Gültigkeitsprüfung ist fehlgeschlagen: %{errors}";
$mbTextRussian = "Проверка не удалась: %{errors}";

$targetLength = 60;
$mbTargetLength = strlen($mbText) - mb_strlen($mbText) + $targetLength;
$mbRussianTargetLength = strlen($mbTextRussian) - mb_strlen($mbTextRussian) + $targetLength;

printf("%{$targetLength}s\n", $text);
printf("%{$mbTargetLength}s\n", $mbText);
printf("%{$mbRussianTargetLength}s\n", $mbTextRussian);

result

            Gultigkeitsprufung ist fehlgeschlagen: %{errors}
            Gültigkeitsprüfung ist fehlgeschlagen: %{errors}
                              Проверка не удалась: %{errors}

update 2019-06-12


@flowtron made me give it another thought. A simple mb_sprintf() could look like this.

function mb_sprintf($format, ...$args) {
    $params = $args;

    $callback = function ($length) use (&$params) {
        $value = array_shift($params);
        return strlen($value) - mb_strlen($value) + $length[0];
    };

    $format = preg_replace_callback('/(?<=%|%-)\d+(?=s)/', $callback, $format);

    return sprintf($format, ...$args);
}

echo mb_sprintf("%-10s %-10s %10s\n", 'thüs', 'wörks', 'ök');
echo mb_sprintf("%-10s %-10s %10s\n", 'this', 'works', 'ok');

result

thüs       wörks              ök
this       works              ok

I only did some happy path testing here, but it works for PHP >=5.6 and should be good enough to give ppl an idea on how to encapsulate the behavior. It does not work with the repetition/order modifiers though - e.g. %1$20s will be ignored/remain unchanged.

Upvotes: 14

Vestman
Vestman

Reputation: 396

If you're using characters that fit in the ISO-8859-1 character set, you can convert the strings before formatting, and convert the result back to UTF8 when you are done

utf8_encode(sprintf("%-12s %-8s", utf8_decode($paramOne), utf8_decode($paramTwo))

Upvotes: 5

Related Questions