tougher
tougher

Reputation: 519

Javascript string comparison fails when comparing unicode characters

I want to compare two strings in JavaScript that are the same, and yet the equality operator == returns false. One string contains a special character (eg. the danish å).

JavaScript code:

var filenameFromJS = "Designhåndbog.pdf";
var filenameFromServer = "Designhåndbog.pdf";

print(filenameFromJS == filenameFromServer); // This prints false why?

The solution What worked for me is unicode normalization as slevithan pointed out.

I forked my original jsfiddle to make a version using the normalization lib suggested by slevithan. Link: http://jsfiddle.net/GWZ8j/1/.

Upvotes: 18

Views: 20170

Answers (5)

user2428118
user2428118

Reputation: 8104

UTF-8 is a complex thing. The charset has two different ways to encode characters such as á, é etc.

Certain Unicode characters can be represented in a composed and decomposed form. For example, the German umlaut-u ü can be represented either by the single character ü or by u followed by ¨, which a text renderer would then combine.

(The Wikipedia article on Unicode equivalence has more details.)

As you already see in the URL encoded version, the HEX bytes of which the character is made differ for both versions.

In JavaScript, you can use String.prototype.normalize() to get a normalized form of a string.

For example:

var normalizedFilenameFromJS = "Designhåndbog.pdf".normalize();
var normalizedFilenameFromServer = "Designhåndbog.pdf".normalize();

console.log(normalizedFilenameFromJS === normalizedFilenameFromServer); // This prints true

.normalize() can be called with a parameter to specify the normalization form; see the linked Mozilla Developer article for available options.

Upvotes: 1

Farkonix
Farkonix

Reputation: 1

Let the browser normalize unicode for you. This approach worked for me:

function normalizeUnicode(s) {
    let div = $('<div style="display: none"></div>').html(s).appendTo('body');
    let res = div.html();
    div.remove();
    return res;
}

normalizeUnicode(unicodeVal1) == normalizeUnicode(unicodeVal2)

Upvotes: 0

Daniel F
Daniel F

Reputation: 14239

I had this same problem.

Adding

<meta charset="UTF-8">

to the HTML file fixed the issue.

In my case the templating engine was baking a json string into the HTML file. This string was in unicode.

While the template was also a unicode file, the JS engine was treating the string I wrote into the template as a latin-1 encoded string, until I added the meta tag.

I was comparing the typed in string to one of the JSON objects items (location.title == "Mühle")

Upvotes: 0

Eric Leschinski
Eric Leschinski

Reputation: 153922

The JavaScript equality operator == will appear to be failing under the following circumstances. In all cases it is programmer error. Not a bug in JavaScript.

  1. The two strings do not contain the same number and sequence of characters.

  2. There is whitespace or newlines before, within or after one string. Use a trim() operator on both and look closely at both strings.

  3. Surprise typecasting. The programmer is comparing datatypes that are incompatible.

  4. There are unicode characters which look identical to other unicode characters but in fact are different unicode characters.

Upvotes: 6

slevithan
slevithan

Reputation: 1414

Unlike what some other people here have said, this has nothing to do with encodings. Rather, your two strings use different code points to render the same visual characters.

To solve this correctly, you need to perform Unicode normalization on the two strings before comparing them. Unforunately, JavaScript doesn't have this functionality built in. Here is a JavaScript library that can perform the normalization for you: https://github.com/walling/unorm

Upvotes: 15

Related Questions