Rain Man
Rain Man

Reputation: 1273

Fast way to count and remove duplicates from an array

I have an array that contains duplicates

array = ["String 1", "string 2", "STRING 1", "String 2", "String 3", "String 1"]

I want to get get rid of duplicates (case insensitive) and create a new array that counts the duplicates.

In one of the answers, I saw this function:

function count_array(arr) {
    var a = [], b = [], prev;

    arr.sort();
    for ( var i = 0; i < arr.length; i++ ) {
        if ( arr[i] !== prev ) {
             a.push(arr[i]);
             b.push(1);
        } else {
             b[b.length-1]++;
        }
        prev = arr[i];
     }
     return [a, b];
 }

which returns two arrays:

First array: ["String 1", "String 2", "STRING 1", "String 3"]
Second array: [2, 2, 1, 1]

It it not case insensitive, I want all instance of String 1, STRING 1, string 1, StRING 1 to be considered as String 1.

Also is there a better way to do this for large arrays? for example an array length of 10K?

Upvotes: 1

Views: 1972

Answers (5)

jo_va
jo_va

Reputation: 13983

This can be done succinctly using Array.reduce to create a map whose keys are the lowercased items of your array and the values are their count. Then get the unique items using Object.keys() and get the counts with Object.values():

const array = ["String 1", "string 2", "STRING 1", "String 2", "String 3", "String 1"];

const map = array.reduce((acc, x) => {
  const xLower = x.toLocaleLowerCase();
  acc[xLower] = (acc[xLower] || 0) + 1;
  return acc;
}, {});

console.log(map);
console.log(Object.keys(map));
console.log(Object.values(map));

Upvotes: 0

Ori Drori
Ori Drori

Reputation: 192857

Reduce the array of strings to an object, using the strings as keys, and the number of appearances as values. Use Object.keys() to get the first array, and Object.values() for second:

const array = ["String 1", "string 2", "STRING 1", "String 2", "String 3", "String 1"]

const counts = array.reduce((r, s) => {
  const key = s[0].toUpperCase() + s.substring(1).toLowerCase();
  
  r[key] = (r[key] || 0) + 1;
  
  return r;
}, {});

const first = Object.keys(counts);
const second = Object.values(counts);

console.log(first);
console.log(second);

To get the result sorted by the number of duplicates, use Object.entries() to convert the results of the reduce to an array of pairs. Sort by the 2nd value (the counts). To get the two array, use Array.map().

const array = ["String 1", "string 2", "STRING 1", "String 2", "String 3", "String 1"]

const counts = Object.entries(array.reduce((r, s) => {
  const key = s[0].toUpperCase() + s.substring(1).toLowerCase();
  
  r[key] = (r[key] || 0) + 1;
  
  return r;
}, {}))
.sort(([, a], [, b]) => b - a);

const first = counts.map(([s]) => s);
const second = counts.map(([, n]) => n);

console.log(first);
console.log(second);

Upvotes: 4

Nina Scholz
Nina Scholz

Reputation: 386786

You could take a some functions and filter noramlized values with counting them.

const
    normalize = s => s.toLowerCase(),
    getFirst = a => a,
    mapCount = (m, k) => m.set(k, (m.get(k) || 0) + 1),
    array = ["String 1", "string 2", "STRING 1", "String 2", "String 3", "String 1"],
    map = new Map,
    array1 = array.filter(v => (k => getFirst(!map.has(k), mapCount(map, k)))(normalize(v))),
    array2 = Array.from(map.values());

console.log(array1);
console.log(array2);

If you are satified with normalized strings as result set, you could take this approach.

const
    normalize = s => s.toLowerCase(),
    mapCount = (m, k) => m.set(k, (m.get(k) || 0) + 1),
    array = ["String 1", "string 2", "STRING 1", "String 2", "String 3", "String 1"],
    map = array.reduce((m, v) => mapCount(m, normalize(v)), new Map),
    array1 = Array.from(map.keys()),
    array2 = Array.from(map.values());

console.log(array1);
console.log(array2);

Upvotes: 1

Artem
Artem

Reputation: 2047

If you are asking about the fastest way to do this it should be done in Big-O(N) asymptotically:

  1. Firstly, you need a hash map to store all your past strings;
  2. Secondly, you need to iterate over your given array placing its values into the hash map;
  3. Finally, you need to increment count of the string in the hash map every time it is met.

It can be implemented like so:

const arr = [...];
const map = {};

for (let i = 0; i <= arr.length - 1; i++) {
    const str = arr[i].toLowerCase();

    if (str in map) {
        map[str]++;

        // keep in mind that removing element from an array costs O(N)
        arr[i] = undefined;
    } else {
        map[str] = 1;
    }
}

// now you have the hash map that represents all strings and its numbers of appearances in the given array
doSomething(map);

// finally return filtered result
return arr.filter(str => str !== undefined);

Upvotes: 0

CertainPerformance
CertainPerformance

Reputation: 371138

.sort() is an O(N log N) process - if you need to sort the results, do it at the very end, if speed is something you're worried about. If you don't need to sort the results, then use a Set (or a Map) instead to check for duplicates, instead of checking a sorted array for similar items in adjacent indicies.

array = ["String 1", "string 2", "STRING 1", "String 2", "String 3", "String 1"]
function count_array(arr) {
  const result = [];
  const map = new Map();
  arr.forEach((str) => {
    const lower = str.toLowerCase();
    const currCount = map.get(lower) || 0;
    if (!currCount) {
      result.push(str);
    }
    map.set(lower, currCount + 1);
  });
  console.log([...map.values()]);
  return result.sort();
}
console.log(count_array(array));

You can use a for loop instead of forEach if you want, a for loop will be slightly faster, though a bit harder to read IMO:

array = ["String 1", "string 2", "STRING 1", "String 2", "String 3", "String 1"]
function count_array(arr) {
  const result = [];
  const map = new Map();
  for (let i = 0, { length } = arr; i < length; i++) {
    const str = arr[i];
    const lower = str.toLowerCase();
    const currCount = map.get(lower) || 0;
    if (!currCount) {
      result.push(str);
    }
    map.set(lower, currCount + 1);
  }
  console.log([...map.values()]);
  return result.sort();
}
console.log(count_array(array));

Upvotes: 2

Related Questions