brienna
brienna

Reputation: 1604

Filter duplicates, including original value, from array of objects based on multiple properties

data is a list of objects. We want to filter all duplicates out, including the original value, based on multiple object properties.

My code works in filtering duplicates out based on multiple object properties, but how can we adjust it to filter out the original value as well?

The goal is to end up with a list of these duplicates.

const data = [{
  name: 'x',
  latitude: '45.9',
  longitude: '50.2'
}, {
  name: 'y',
  latitude: '45.9',
  longitude: '50.2'
}, {
  name: 'z',
  latitude: '40.5',
  longitude: '85.7'
}];

const duplicates = data
  .filter((obj, index, array) =>
    array.findIndex(o =>
      o.latitude === obj.latitude &&
      o.longitude === obj.longitude
    ) != index
  );

console.log(duplicates);

Output:

[{
  name: 'y',
  latitude: '45.9',
  longitude: '50.2'
}]

Desired output:

[{
  name: 'x',
  latitude: '45.9',
  longitude: '50.2'
}, {
  name: 'y',
  latitude: '45.9',
  longitude: '50.2'
}]

Upvotes: 3

Views: 2107

Answers (3)

Scott Sauyet
Scott Sauyet

Reputation: 50807

A simple fix to your code to do this might look like the following:

const duplicates = (data) => data
  .filter((obj, index, array) =>
    array.find((o, i) =>
      o.latitude === obj.latitude &&
      o.longitude === obj.longitude &&
      i != index
    ) 
  )

We simply need to test for mismatched indices inside the find callback.

But I think there is much to be gained by separating out the filtering/dup-checking logic from the code that tests whether two elements are equal. The breakdown is more logical and we get a potentially reusable function from it.

So I might write it like this:

const keepDupsBy = (eq) => (xs) => xs .filter (
  (x, i) => xs .find ((y, j) => i !== j && eq (x, y))
)

const dupLocations = keepDupsBy ((a, b) => 
  a .latitude == b.latitude && 
  a .longitude == b .longitude
) 

const data = [{name: 'x', latitude: '45.9', longitude: '50.2'}, {name: 'y', latitude: '45.9', longitude: '50.2'}, {name: 'z', latitude: '40.5', longitude: '85.7'}];

console .log (dupLocations (data))
.as-console-wrapper {max-height: 100% !important; top: 0}

This keeps all the elements in the original array that have duplicates elsewhere, and returns them in their relative order from the original array. This is the same order as the above, but different from the interesting approach in Peter Seliger's answer which groups together all the matching values, returned in the relative order of the first elements of each group.

Note too the performance difference if you're expecting to use this on large lists. Your original and all the answers but Peter's operate in O (n^2) time. Peter's operates in O (n). For larger lists, the difference could be substantial. The tradeoff is different when it comes to memory resources, as Peter's operates in O (n) additional memory, while all other here operate in constant memory -- O (1). None of this likely makes a difference unless you're working in tens of thousands of elements or above, but it's often worth considering.

Upvotes: 1

Peter Seliger
Peter Seliger

Reputation: 13417

This reduce based approach detects duplicates by their geo coordinates signature which is a string based key, concatenated by each item's latitude and longitude property values.

This key gets used for the grouping of coordinate items, and the value of the group type tells whether the signature refers to a single item or same coordinate items (duplicates). As soon as at least a double was found these items also get collected by the internal accumulators list object. Thus this approach iterates just once and also delivers the final result with the end of the single reduce cycle ...

function collectDuplicates(collector, item) {
  const { index, list } = collector;
  const { latitude, longitude } = item;

  const key = [
    parseFloat(latitude),
    parseFloat(longitude),
  ].join('/');

  const grouped = index[key];

  if (Array.isArray(grouped)) {
    // already more than 2 duplicates detected.

    grouped.push(item);
    list.push(item);

  } else if (grouped) {
    // first time duplicate detection (2 same items).

    index[key] = [grouped, item];
    list.push(grouped, item);

  } else {
    // register first item of its kind.
    index[key] = item;
  }
  return collector;
}

const data = [{
  name: 'x',
  latitude: '45.9',
  longitude: '50.2'
}, {
  name: 'y',
  latitude: '45.9',
  longitude: '50.2'
}, {
  name: 'z',
  latitude: '40.5',
  longitude: '85.7'
}];

const duplicates =
  data.reduce(collectDuplicates, { index: {}, list: [] }).list;

console.log({ duplicates });

console.log(
  'data.reduce(collectDuplicates, { index: {}, list: [] }) ...',
  data.reduce(collectDuplicates, { index: {}, list: [] })
)
.as-console-wrapper { min-height: 100%!important; top: 0; }

Upvotes: 1

Tushar Shahi
Tushar Shahi

Reputation: 20701

Instead of findIndex, you can run a forloop with the extra condition that index should not be the same index you are checking for.

Based on that you can directly return from inside the loop.

var data = [
    { name: 'x',
        latitude: '45.9',
      longitude: '50.2'},
    { name: 'y',
        latitude: '45.9',
      longitude: '50.2'},
    { name: 'z',
        latitude: '40.5',
      longitude: '85.7'},
];

var duplicates = data.filter((obj, index, array) => {
  for(let i = 0 ; i < array.length;i++){
  if(i!=index && array[i].latitude == obj.latitude 
    && array[i].longitude == obj.longitude ){
      return true;
    }
  }
  return false;    
});
    
    

console.log(duplicates);

Upvotes: 2

Related Questions