Marc van Nieuwenhuijzen
Marc van Nieuwenhuijzen

Reputation: 1657

Visually same string gives different var_dumps in php

UPDATE: Anwer is here PHP unserialize fails with non-encoded characters?

I'm trying to match objects with in_array. This works fine except for the object with this string as a property. Visually they are the same, but when I do a var_dump PHP sees different lengths.

var_dump results:

string(26) "Waar    zijn mijn centjes👼"
string(31) "Waar    zijn mijn centjes👼"

What can be the cause? Some ascii value I don't know of?

Upvotes: 2

Views: 1761

Answers (2)

TheoPlatica
TheoPlatica

Reputation: 1271

Here is my case:

$plan_name: string(29) "check-up & template optimizer"
$cell_comp: string(33) "check-up & template optimizer"

The problem here was the '&' which in the 1st string was being seen as an HTML character and in the 2nd string was seen as an ASCII entity.

If you don't understand the difference, take a look here (see the difference between 'Symbol' and 'HTML Number' columns): https://ascii.cl/htmlcodes.htm

The solution: Applying on both strings the same function, which can be:

  • either converting the ASCII entities to their corresponding character: html_entity_decode()
  • either converting the HTML characters to their corresponding ASCII entities: htmlentities()

Here is how the final result looked for me:

$plan_name = html_entity_decode( strtolower( sanitize_text_field($key) ) );
$cell_comp = html_entity_decode( strtolower( sanitize_text_field($cell_2) ) );

if( $plan_name == $cell_comp ) :
 ...
endif;

Upvotes: 0

Beat
Beat

Reputation: 1380

Let's look at the hex dump of your strings:

57616172097a696a6e206d696a6e2063656e746a6573f09f91bc

and

57616172097a696a6e206d696a6e2063656e746a657326237831663437633b

As we can clearly see, there's only a difference in the end: f09f91bc becomes 26237831663437633b.

So what's the difference?

f09f91bc is the hex representation of U+1F47C BABY ANGEL character (👼), so that one is perfect UTF-8.

But 26237831663437633b isn't UTF-8 anymore, the string is actually ASCII and translates to 👼, so it's simply HTML's numeric character reference of the baby angel character.

So the angel must have somewhere been translated to its HTML numeric character reference and that is not something that happens just when writing and reading from a file or a DB. I'd guess it has happened somewhere in your output processing.

You may use html_entity_decode to translate the HTML entities back to their UTF-8 equivalent:

$a = html_entity_decode('Waar    zijn mijn centjes👼');
$b = 'Waar    zijn mijn centjes👼';
var_dump($a === $b);

See http://phpfiddle.org/lite/code/n6t1-d9w7 to try the code.

Upvotes: 1

Related Questions