Reputation: 2210
I'm going through a code review and I'm curious if it's better to convert strings to upper or lower case in JavaScript when attempting to compare them while ignoring case.
Trivial example:
var firstString = "I might be A different CASE";
var secondString = "i might be a different case";
var areStringsEqual = firstString.toLowerCase() === secondString.toLowerCase();
or should I do this:
var firstString = "I might be A different CASE";
var secondString = "i might be a different case";
var areStringsEqual = firstString.toUpperCase() === secondString.toUpperCase();
It seems like either "should" or would work with limited character sets like only English letters, so is one more robust than the other?
As a note, MSDN recommends normalizing strings to uppercase, but that is for managed code (presumably C# & F# but they have fancy StringComparers
and base libraries):
http://msdn.microsoft.com/en-us/library/bb386042.aspx
Upvotes: 27
Views: 18624
Reputation: 8567
If you don't want to use a locale-based solution, you can simply do:
const areStringsEqual = (a, b) =>
a.toLowerCase().toUpperCase() === b.toLowerCase().toUpperCase()
This will work correctly for all unicode code points.
(Note: the reverse a.toUpperCase().toLowerCase()
does not work due to one odd phenomenon: lowercasing ẞ
results in ß
, but uppercasing ß
results in SS
!).
Bonus:
This is particularly helpful in typescript, if you want to compare constant string types at compile time, since there are no locale-based utility types!
type AreStringsEqual<A extends string, B extends string> =
Uppercase<Lowercase<A>> extends Uppercase<Lowercase<B>> ? true : false
Upvotes: 0
Reputation: 407
It never depends upon the browser as it is only the JavaScript which is involved. both will give the performance based upon the no of characters need to be changed (flipping case)
var areStringsEqual = firstString.toLowerCase() === secondString.toLowerCase();
var areStringsEqual = firstString.toUpperCase() === secondString.toUpperCase();
If you use test prepared by @adeneo you can feel it's browser dependent, but make some other test inputs like:
"AAAAAAAAAAAAAAAAAAAAAAAAAAAA"
and
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
and compare.
Javascript performance depends upon the browser if some DOM API or any DOM manipulation/interaction is there, otherwise for all plain JavaScript, it will give the same performance.
Upvotes: -3
Reputation: 1
Some other options have been presented, but if you must use toLowerCase
, or
toUpperCase
, I wanted some actual data on this. I pulled the full list
of two byte characters that fail with toLowerCase
or toUpperCase
. I then
ran this test:
let pairs = [
[0x00E5,0x212B],[0x00C5,0x212B],[0x0399,0x1FBE],[0x03B9,0x1FBE],[0x03B2,0x03D0],
[0x03B5,0x03F5],[0x03B8,0x03D1],[0x03B8,0x03F4],[0x03D1,0x03F4],[0x03B9,0x1FBE],
[0x0345,0x03B9],[0x0345,0x1FBE],[0x03BA,0x03F0],[0x00B5,0x03BC],[0x03C0,0x03D6],
[0x03C1,0x03F1],[0x03C2,0x03C3],[0x03C6,0x03D5],[0x03C9,0x2126],[0x0392,0x03D0],
[0x0395,0x03F5],[0x03D1,0x03F4],[0x0398,0x03D1],[0x0398,0x03F4],[0x0345,0x1FBE],
[0x0345,0x0399],[0x0399,0x1FBE],[0x039A,0x03F0],[0x00B5,0x039C],[0x03A0,0x03D6],
[0x03A1,0x03F1],[0x03A3,0x03C2],[0x03A6,0x03D5],[0x03A9,0x2126],[0x0398,0x03F4],
[0x03B8,0x03F4],[0x03B8,0x03D1],[0x0398,0x03D1],[0x0432,0x1C80],[0x0434,0x1C81],
[0x043E,0x1C82],[0x0441,0x1C83],[0x0442,0x1C84],[0x0442,0x1C85],[0x1C84,0x1C85],
[0x044A,0x1C86],[0x0412,0x1C80],[0x0414,0x1C81],[0x041E,0x1C82],[0x0421,0x1C83],
[0x1C84,0x1C85],[0x0422,0x1C84],[0x0422,0x1C85],[0x042A,0x1C86],[0x0463,0x1C87],
[0x0462,0x1C87]
];
let upper = 0, lower = 0;
for (let pair of pairs) {
let row = 'U+' + pair[0].toString(16).padStart(4, '0') + ' ';
row += 'U+' + pair[1].toString(16).padStart(4, '0') + ' pass: ';
let s = String.fromCodePoint(pair[0]);
let t = String.fromCodePoint(pair[1]);
if (s.toUpperCase() == t.toUpperCase()) {
row += 'toUpperCase ';
upper++;
} else {
row += ' ';
}
if (s.toLowerCase() == t.toLowerCase()) {
row += 'toLowerCase';
lower++;
}
console.log(row);
}
console.log('upper pass: ' + upper + ', lower pass: ' + lower);
Interestingly, one of the pairs fails with both. But based on this, toUpperCase is the best option.
Upvotes: 6
Reputation: 18662
It's been quite a while when I answered this question. While cultural issues still holds true (and I don't think they will ever go away), the development of ECMA-402 standard made my original answer... outdated (or obsolete?).
The best solution for comparing localized strings seems to be using function localeCompare()
with appropriate locales and options:
var locale = 'en'; // that should be somehow detected and passed on to JS
var firstString = "I might be A different CASE";
var secondString = "i might be a different case";
if (firstString.localeCompare(secondString, locale, {sensitivity: 'accent'}) === 0) {
// do something when equal
}
This will compare two strings case-insensitive, but accent-sensitive (for example ą != a).
toLocaleUpperCase()
If this is not sufficient for performance reasons, you may want to use eitheror
toLocaleLowerCase()` passing the locale as a parameter:
if (firstString.toLocaleUpperCase(locale) === secondString.toLocaleUpperCase(locale)) {
// do something when equal
}
In theory there should be no differences. In practice, subtle implementation details (or lack of implementation in the given browser) may yield different results...
I am not sure if you really meant to ask this question in Internationalization (i18n) tag, but since you did...
Probably the most unexpected answer is: neither.
There are tons of problems with case conversion, which inevitably leads to functional issues if you want to convert the character case without indicating the language (like in JavaScript case). For instance:
I am trying to convince you that it is really better to compare user input literally, rather than converting it. If it is not user-related, it probably doesn't matter, but case conversion will always take time. Why bother?
Upvotes: 32