mgutt
mgutt

Reputation: 6177

Find specific UTF8 chars independent of php code charset?

I like to match some specific UTF8 chars. In my case German Umlauts. Thats our example code:

{UTF-8 file}
<?php
$search = 'ä,ö,ü';
$replace = 'ae,oe,ue';
$string = str_replace(explode(',', $search), explode(',', $replace), $string);
?>

This code is UTF-8. Now I like to ensure that this will work independent of (most) used charsets of the code.

Is this the way I should go (used UTF-8 check)?

{ISO file}
<?php
$search = 'ä,ö,ü';
$search = preg_match('~~u', $search) ? $search : utf8_encode($search);
$replace = 'ae,oe,ue';
$string = str_replace(explode(',', $search), explode(',', $replace), $string);
?>

Upvotes: 1

Views: 236

Answers (1)

deceze
deceze

Reputation: 522005

  1. You should be in control of what your source code is encoded as, it'd be very weird to suddenly have its encoding change out from under you.
  2. If that is actually a legitimate concern you want to counteract, then you can't even rely on your source code being either Latin-1 or UTF-8, it could be any number of other encodings (though admittedly in practice Latin-1 is a pretty common guess). So utf8_encode is not guaranteed to fix your problem at all.
  3. To be 100% agnostic of your source code file's encoding, denote your characters as raw bytes:

    $search = "\xC3\xA4,\xC3\xB6,\xC3\xBC"; // ä, ö and ü in UTF-8
    
  4. Note that this still won't guarantee what encoding $string will be in, you'll need to know and/or control its encoding separately from this issue at hand. At some point you just have to nail down your used encodings, you can't be agnostic of it all the way through.

Upvotes: 1

Related Questions