Yanick Rochon
Yanick Rochon

Reputation: 53576

Why does Intl.Collator sort negative numbers in descending order?

I cam across this use case and I am puzzled by it :

const naturalCollator = new Intl.Collator(undefined, {
  numeric: true,
  sensitivity: 'base'
});
const comparator = (a, b) => naturalCollator.compare(a, b);

const numbers = [-1, 0, 1, 10, NaN, 2, -0.001, NaN, 0, -1, -Infinity, NaN, 5, -10, Infinity, 0];

console.log(numbers.sort(comparator));

The result array list negative numbers in descending order, while positive in ascending order. For example :

[-3, 1, -2, 2].sort(comparator)
// [-2, -3, 1, 2]

Since Intl.Collator is a "language-sensitive string comparison", does it simply ignore the sign and only evaluates every number as positive?

Edit

Another inconsistency is this one:

["b1", "a-1", "b-1", "a+1", "a1"].sort(comparator);
// ['a-1', 'a+1', 'a1', 'b-1', 'b1']

Where 'a' < 'b' so the order is OK, but '-' > '+' so why is "a-1" before "a+1"?

In other words, a negative sign is considered less than a positive sign regardless of it's character code, however "-1" is considered less than "-2", ignoring the sign.

Upvotes: 3

Views: 766

Answers (1)

jsejcksn
jsejcksn

Reputation: 33786

The default string sorting algorithm uses the unicode values for each code unit in the strings being compared. This is called "lexicographic sort".

When you set the collator options, you are defining specific overrides to this behavior (you can think of them as higher-priority rules above lexicographic sort).

Here's a link to the relevant spec section: https://tc39.es/ecma402/#sec-collator-comparestrings

When comparing number values (like in your example), the first step is for the numbers to be coerced to strings before they are used in the internal sort function.

When using the numeric option, the effect is only applied to code units which are classified as numbers.

In the case of your stringified negative values, the hyphens are evaluated as non-numeric characters. Then the contiguous sequences of digits are evaluated as number-like groups.

You can see the effect of this when sorting other strings which begin with hyphens alongside the numbers:

const opts = { numeric: true, sensitivity: 'base' };
const naturalCollator = new Intl.Collator(undefined, opts);

const values = [-3, 1, -2, 2, '-foo', '-bar', 'foo', 'bar'];

console.log(values.sort(naturalCollator.compare));
//=> [-2, -3, "-bar", "-foo", 1, 2, "bar", "foo"]


Another example of where the numeric option is useful: Consider a series of filenames with numeric substrings intended for grouped ordering:

const opts = { numeric: true, sensitivity: 'base' };
const naturalCollator = new Intl.Collator(undefined, opts);

const fileNames = [
  'IMG_1.jpg',
  'IMG_2.jpg',
  'IMG_3.jpg',
  // ...
  'IMG_100.jpg',
  'IMG_101.jpg',
  'IMG_102.jpg',
  // ...
  'IMG_200.jpg',
  'IMG_201.jpg',
  'IMG_202.jpg',
  // etc...
];

fileNames.sort();
console.log(fileNames); // 🙈
//=> ["IMG_1.jpg", "IMG_100.jpg", "IMG_101.jpg", "IMG_102.jpg", "IMG_2.jpg", "IMG_200.jpg", "IMG_201.jpg", "IMG_202.jpg", "IMG_3.jpg"]

fileNames.sort(naturalCollator.compare);
console.log(fileNames); // 🤩
//=> ["IMG_1.jpg", "IMG_2.jpg", "IMG_3.jpg", "IMG_100.jpg", "IMG_101.jpg", "IMG_102.jpg", "IMG_200.jpg", "IMG_201.jpg", "IMG_202.jpg"]

Upvotes: 3

Related Questions