proximus
proximus

Reputation: 689

Comparing two unicode strings in PHP

I am stuck in comparing two unicode strings in PHP which both contain the special char 'ö'. One string comes from $_GET, the other one is a filesystem's folder name (scandir()). Both strings seem to be equal to me, making a

var_dump($filter);
var_dump($tail . '/' . $k);

on them also shows their equality but with different string lenghts (?!):

string '/blöb' (length=7)
string '/blöb' (length=6)

My snippet comparing them looks as follows:

if($filter == ($tail . '/' . $k)) {
    /* ... */
}

What's going on here?

Additional information: $tail is an empty string:

string '' (length=0)

Upvotes: 3

Views: 6018

Answers (2)

Florian Margaine
Florian Margaine

Reputation: 60717

Can you try parsing them through utf8_encode() and checking them there? PHP doesn't support unicode and therefore advises to use utf8_encode/decode for some basic Unicode features.

http://php.net/manual/en/language.types.string.php

Upvotes: -1

Ariel
Ariel

Reputation: 26753

See here: http://en.wikipedia.org/wiki/Unicode_equivalence and use this: http://www.php.net/manual/en/class.normalizer.php

You probably have a decomposed character in the longer string, meaning an o and then a umlaut combining character which overlays the previous character.

The normalizer function will fix things like that.

As a side note you should always normalize your input if you are using it for equivalence (for example a username - you want to make sure two people don't choose the same username, even if the binary representation of the string happens to be different).

Upvotes: 3

Related Questions