Reputation: 1273
I have an array that contains duplicates
array = ["String 1", "string 2", "STRING 1", "String 2", "String 3", "String 1"]
I want to get get rid of duplicates (case insensitive) and create a new array that counts the duplicates.
In one of the answers, I saw this function:
function count_array(arr) {
var a = [], b = [], prev;
arr.sort();
for ( var i = 0; i < arr.length; i++ ) {
if ( arr[i] !== prev ) {
a.push(arr[i]);
b.push(1);
} else {
b[b.length-1]++;
}
prev = arr[i];
}
return [a, b];
}
which returns two arrays:
First array: ["String 1", "String 2", "STRING 1", "String 3"]
Second array: [2, 2, 1, 1]
It it not case insensitive, I want all instance of String 1, STRING 1, string 1, StRING 1
to be considered as String 1
.
Also is there a better way to do this for large arrays? for example an array length of 10K?
Upvotes: 1
Views: 1972
Reputation: 13983
This can be done succinctly using Array.reduce
to create a map whose keys are the lowercased items of your array and the values are their count. Then get the unique items using Object.keys()
and get the counts with Object.values()
:
const array = ["String 1", "string 2", "STRING 1", "String 2", "String 3", "String 1"];
const map = array.reduce((acc, x) => {
const xLower = x.toLocaleLowerCase();
acc[xLower] = (acc[xLower] || 0) + 1;
return acc;
}, {});
console.log(map);
console.log(Object.keys(map));
console.log(Object.values(map));
Upvotes: 0
Reputation: 192857
Reduce the array of strings to an object, using the strings as keys, and the number of appearances as values. Use Object.keys()
to get the first array, and Object.values()
for second:
const array = ["String 1", "string 2", "STRING 1", "String 2", "String 3", "String 1"]
const counts = array.reduce((r, s) => {
const key = s[0].toUpperCase() + s.substring(1).toLowerCase();
r[key] = (r[key] || 0) + 1;
return r;
}, {});
const first = Object.keys(counts);
const second = Object.values(counts);
console.log(first);
console.log(second);
To get the result sorted by the number of duplicates, use Object.entries()
to convert the results of the reduce to an array of pairs. Sort by the 2nd value (the counts). To get the two array, use Array.map()
.
const array = ["String 1", "string 2", "STRING 1", "String 2", "String 3", "String 1"]
const counts = Object.entries(array.reduce((r, s) => {
const key = s[0].toUpperCase() + s.substring(1).toLowerCase();
r[key] = (r[key] || 0) + 1;
return r;
}, {}))
.sort(([, a], [, b]) => b - a);
const first = counts.map(([s]) => s);
const second = counts.map(([, n]) => n);
console.log(first);
console.log(second);
Upvotes: 4
Reputation: 386786
You could take a some functions and filter noramlized values with counting them.
const
normalize = s => s.toLowerCase(),
getFirst = a => a,
mapCount = (m, k) => m.set(k, (m.get(k) || 0) + 1),
array = ["String 1", "string 2", "STRING 1", "String 2", "String 3", "String 1"],
map = new Map,
array1 = array.filter(v => (k => getFirst(!map.has(k), mapCount(map, k)))(normalize(v))),
array2 = Array.from(map.values());
console.log(array1);
console.log(array2);
If you are satified with normalized strings as result set, you could take this approach.
const
normalize = s => s.toLowerCase(),
mapCount = (m, k) => m.set(k, (m.get(k) || 0) + 1),
array = ["String 1", "string 2", "STRING 1", "String 2", "String 3", "String 1"],
map = array.reduce((m, v) => mapCount(m, normalize(v)), new Map),
array1 = Array.from(map.keys()),
array2 = Array.from(map.values());
console.log(array1);
console.log(array2);
Upvotes: 1
Reputation: 2047
If you are asking about the fastest way to do this it should be done in Big-O(N)
asymptotically:
It can be implemented like so:
const arr = [...];
const map = {};
for (let i = 0; i <= arr.length - 1; i++) {
const str = arr[i].toLowerCase();
if (str in map) {
map[str]++;
// keep in mind that removing element from an array costs O(N)
arr[i] = undefined;
} else {
map[str] = 1;
}
}
// now you have the hash map that represents all strings and its numbers of appearances in the given array
doSomething(map);
// finally return filtered result
return arr.filter(str => str !== undefined);
Upvotes: 0
Reputation: 371138
.sort()
is an O(N log N)
process - if you need to sort the results, do it at the very end, if speed is something you're worried about. If you don't need to sort the results, then use a Set
(or a Map
) instead to check for duplicates, instead of checking a sorted array for similar items in adjacent indicies.
array = ["String 1", "string 2", "STRING 1", "String 2", "String 3", "String 1"]
function count_array(arr) {
const result = [];
const map = new Map();
arr.forEach((str) => {
const lower = str.toLowerCase();
const currCount = map.get(lower) || 0;
if (!currCount) {
result.push(str);
}
map.set(lower, currCount + 1);
});
console.log([...map.values()]);
return result.sort();
}
console.log(count_array(array));
You can use a for
loop instead of forEach
if you want, a for
loop will be slightly faster, though a bit harder to read IMO:
array = ["String 1", "string 2", "STRING 1", "String 2", "String 3", "String 1"]
function count_array(arr) {
const result = [];
const map = new Map();
for (let i = 0, { length } = arr; i < length; i++) {
const str = arr[i];
const lower = str.toLowerCase();
const currCount = map.get(lower) || 0;
if (!currCount) {
result.push(str);
}
map.set(lower, currCount + 1);
}
console.log([...map.values()]);
return result.sort();
}
console.log(count_array(array));
Upvotes: 2