Reputation:
What is the most professional way to obtain a case insensitive count of the distinct words contained in an array using plain javascript? I have done the first attempt myself but does not feel much professional.
I would like the result to be a Map
Upvotes: 1
Views: 698
Reputation: 12929
While the accepted 'group-by' operation is fine, it doesn't address the complexity of case-insensitive/unicode comparison.
First of all, you can reduce directly into a Map, here counting characters as they are without accounting for case-insensitivity or unicode variations resulting in 20 'distinct' characters being counted from an array of length 24.
const input = [ 'a', 'A', 'b', 'B', '\u00F1', '\u006E\u0303', 'İ', 'i', 'Gesäß',
'GESÄSS', '\u0399', '\u1FBE', '\u00E5', '\u212B', '\u00C5', '\u212B', '\u0399', '\u1FBE', '\u03B9', '\u1FBE', '\u03B2', '\u03D0', '\u03B5', '\u03F5', ];
const result = input.reduce((a, b) => a.set(b, (a.get(b) ?? 0) + 1), new Map());
console.log('distinct count: ', result.size); console.log('Map(',result.size,') {', [...result.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}');
Based on the samples below, the method that results in the most compact count is using word.normalize().toLocaleUpperCase()
and passing Turkey('tr') as a locale for this specific sample array. It results in 9 'distinct' characters being counted from an array of length 24, properly handling different encodings for ñ
, equivalent spellings of Gesäß
(GESÄSS
), and accounting for locale specific case changes (i
to İ
)
const input = [ 'a', 'A', 'b', 'B', '\u00F1', '\u006E\u0303', 'İ', 'i', 'Gesäß',
'GESÄSS', '\u0399', '\u1FBE', '\u00E5', '\u212B', '\u00C5', '\u212B', '\u0399', '\u1FBE', '\u03B9', '\u1FBE', '\u03B2', '\u03D0', '\u03B5', '\u03F5', ];
const result_normalize_locale = input.reduce((a, b) => {
const w = b.normalize().toLocaleUpperCase('tr');
return a.set(w, (a.get(w) ?? 0) + 1);
}, new Map());
console.log('distinct count: ', result_normalize_locale.size); console.log('Map(',result_normalize_locale.size,') {', [...result_normalize_locale.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}');
Using this simple 'group-by' we can look at the variations between the available case comparison methods: toLowerCase()
, toLocaleLowerCase()
, toUpperCase()
, and toLocaleUpperCase()
and unicode variations can be accounted for using normalize()
To lower case
toLowerCase()
– 15 'distinct' characters.
toLocaleLowerCase()
– 14 'distinct' characters, in this case specifying Turkey('tr') as locale.
normalize().toLocaleLowerCase()
– 12 'distinct' characters, again with 'tr' as locale.
const input = [ 'a', 'A', 'b', 'B', '\u00F1', '\u006E\u0303', 'İ', 'i', 'Gesäß',
'GESÄSS', '\u0399', '\u1FBE', '\u00E5', '\u212B', '\u00C5', '\u212B', '\u0399', '\u1FBE', '\u03B9', '\u1FBE', '\u03B2', '\u03D0', '\u03B5', '\u03F5', ];
// ['a', 'A', 'b', 'B', 'ñ', 'ñ', 'İ', 'i', 'Gesäß', 'GESÄSS', 'Ι', 'ι', 'å', 'Å', 'Å', 'Å', 'Ι', 'ι', 'ι', 'ι', 'β', 'ϐ', 'ε', 'ϵ', ]
// input.length: 24
// grouping by toLowerCase()
const result = input.reduce((a, b) => {
const w = b.toLowerCase();
return a.set(w, (a.get(w) ?? 0) + 1);
}, new Map());
// grouping by toLocaleLowerCase('tr') [Turkey]
const result_locale = input.reduce((a, b) => {
const w = b.toLocaleLowerCase('tr');
return a.set(w, (a.get(w) ?? 0) + 1);
}, new Map());
// grouping by normalize().toLocaleLowerCase('tr') [Turkey]
const result_normalize_locale = input.reduce((a, b) => {
const w = b.normalize().toLocaleLowerCase('tr');
return a.set(w, (a.get(w) ?? 0) + 1);
}, new Map());
// log toLowerCase() result - 15 'distinct' characters
console.log('toLowerCase() '); console.log('distinct count: ', result.size); console.log('Map(',result.size,') {', [...result.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}');
// log toLocaleLowerCase('tr') result - 14 'distinct' characters
console.log("\ntoLocaleLowerCase('tr')"); console.log('distinct count: ', result_locale.size); console.log('Map(',result_locale.size,') {', [...result_locale.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}');
// log normalize().toLocaleLowerCase('tr') result - 12 'distinct' characters
console.log("\nnormalize().toLocaleLowerCase('tr')"); console.log('distinct count: ', result_normalize_locale.size); console.log('Map(',result_normalize_locale.size,') {', [...result_normalize_locale.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}');
.as-console-wrapper { max-height: 100% !important; top: 0; }
To upper case
toUpperCase()
– 12 'distinct' characters.
toLocaleUpperCase()
– 11 'distinct' characters, in this case specifying Turkey('tr') as locale.
normalize().toLocaleUpperCase()
– 9 'distinct' characters, again with 'tr' as locale.
const input = [ 'a', 'A', 'b', 'B', '\u00F1', '\u006E\u0303', 'İ', 'i', 'Gesäß',
'GESÄSS', '\u0399', '\u1FBE', '\u00E5', '\u212B', '\u00C5', '\u212B', '\u0399', '\u1FBE', '\u03B9', '\u1FBE', '\u03B2', '\u03D0', '\u03B5', '\u03F5', ];
// ['a', 'A', 'b', 'B', 'ñ', 'ñ', 'İ', 'i', 'Gesäß', 'GESÄSS', 'Ι', 'ι', 'å', 'Å', 'Å', 'Å', 'Ι', 'ι', 'ι', 'ι', 'β', 'ϐ', 'ε', 'ϵ', ]
// input.length: 24
// grouping by toUpperCase()
const result = input.reduce((a, b) => {
const w = b.toUpperCase();
return a.set(w, (a.get(w) ?? 0) + 1);
}, new Map());
// grouping by toLocaleUpperCase('tr') [Turkey]
const result_locale = input.reduce((a, b) => {
const w = b.toLocaleUpperCase('tr');
return a.set(w, (a.get(w) ?? 0) + 1);
}, new Map());
// grouping by normalize().toLocaleUpperCase('tr') [Turkey]
const result_normalize_locale = input.reduce((a, b) => {
const w = b.normalize().toLocaleUpperCase('tr');
return a.set(w, (a.get(w) ?? 0) + 1);
}, new Map());
// log toUpperCase() result - 12 'distinct' characters
console.log('toUpperCase() '); console.log('distinct count: ', result.size); console.log('Map(',result.size,') {', [...result.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}');
// log toLocaleUpperCase('tr') result - 11 'distinct' characters
console.log("\ntoLocaleUpperCase('tr')"); console.log('distinct count: ', result_locale.size); console.log('Map(',result_locale.size,') {', [...result_locale.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}');
// log normalize().toLocaleUpperCase('tr') result - 9 'distinct' characters
console.log("\nnormalize().toLocaleUpperCase('tr')"); console.log('distinct count: ', result_normalize_locale.size); console.log('Map(',result_normalize_locale.size,') {', [...result_normalize_locale.entries()].map(([k, v]) => `${k} => ${v}`).join(', '), '}');
.as-console-wrapper { max-height: 100% !important; top: 0; }
Upvotes: 0
Reputation: 18116
You can use an object to store the results and then create a Map
object by passing that object to Object.entries
const arr = ["c", "A", "C", "B", "b"];
const counts = {};
for (const el of arr) {
let c = el.toLowerCase();
counts[c] = counts[c] ? ++counts[c] : 1;
}
console.log(counts);
const map = new Map(Object.entries(counts))
map.forEach((k,v) => console.log(k,v))
Upvotes: 0
Reputation: 31992
You can use Array.reduce
to store each word as a property and the occurrence of each as the value.
In the reducer function, check whether the letter (converted to lowercase) exists as a property. If not, set its value to 1
. Otherwise, increment the property value.
const arr = ["a", "A", "b", "B"]
const result = arr.reduce((a,b) => {
let c = b.toLowerCase();
return a[c] = a[c] ? ++a[c] : 1, a;
}, {})
console.log(result)
const result = arr.reduce((a,b) => (c = b.toLowerCase(), a[c] = a[c] ? ++a[c] : 1, a), {})
To convert it to a Map
, you can use Object.entries
(sugged by @Théophile):
const arr = ["a", "A", "b", "B"]
const result = arr.reduce((a, b) => {
let c = b.toLowerCase();
return a[c] = a[c] ? ++a[c] : 1, a;
}, {})
const m = new Map(Object.entries(result))
m.forEach((value, key) => console.log(key, ':', value))
Upvotes: 2
Reputation: 5304
use set to get rid of duplicates and the spread operator to put it back in an array.
const myarray = ['one', 'One', 'two', 'TWO', 'three'];
const noDupes = [... new Set( myarray.map(x => x.toLowerCase()))];
console.log(noDupes);
Upvotes: -1