Reputation: 16112
I want to deduplicate an array of arrays. A duplicate array is one that matches a subset of element indices. In this case, say, index [1]
and index [3]
.
const unDeduplicated = [
[ 11, 12, 13, 14, 15, ],
[ 21, 22, 23, 24, 25, ],
[ 31, 88, 33, 99, 35, ], // duplicate in indices: 1, 3 with row index 4
[ 41, 42, 43, 44, 45, ],
[ 51, 88, 53, 99, 55, ], // duplicate in indices: 1, 3 // delete this row from result
];
const deduplicated = getDeduplicated( unDeduplicated, [ 1, 3, ], );
console.log( deduplicated );
// expected result:
// [
// [ 11, 12, 13, 14, 15, ],
// [ 21, 22, 23, 24, 25, ],
// [ 31, 88, 33, 99, 35, ],
// [ 41, 42, 43, 44, 45, ],
// // this row was omitted from result because it was duplicated at indices 1 and 3 with row index 2
// ]
What is a function getDeduplicated()
that can give me such a result?
I have tried the below function but it's just a start. And it isn't close to giving me the desired result. But it gives an idea of what I'm trying to do.
/**
* Returns deduplicated array as a data grid ([][] -> 2D array)
* @param { [][] } unDedupedDataGrid The original data grid to be deduplicated to include only unque rows as defined by the indices2compare.
* @param { Number[] } indices2compare An array of indices to compare for each array element.
* If every element at each index for a given row is duplicated elsewhere in the array,
* then the array element is considered a duplicate
* @returns { [][] }
*/
const getDeduplicated = ( unDedupedDataGrid, indices2compare, ) => {
let deduped = [];
unDedupedDataGrid.forEach( row => {
const matchedArray = a.filter( row => row[1] === 88 && row[3] === 99 );
const matchedArrayLength = matchedArray.length;
if( matchedArrayLength ) return;
deduped.push( row, );
});
}
I've researched some lodash functions that might help like _.filter
and _.some
but so far, I can't seem to find a structure that produces the desired result.
Upvotes: 0
Views: 414
Reputation: 14318
It's probably not the most efficient algorithm, but I'd do something like
function getDeduplicated(unDeduplicated, idxs) {
const result = [];
const used = new Set();
unDeduplicated.forEach(arr => {
const vals = idxs.map(i => arr[i]).join();
if (!used.has(vals)) {
result.push(arr);
used.add(vals);
}
});
return result;
}
Upvotes: 1
Reputation: 4519
Not the most efficient but this will remove dups of more than one duplicate array
const unDeduplicated = [ [ 11, 12, 13, 14, 15, ], [ 21, 22, 23, 24, 25, ], [ 31, 88, 33, 99, 35, ], [ 41, 33, 43, 44, 45, ], [ 51, 88, 53, 99, 55, ]]
const unDeduplicated1 = [
[ 11, 12, 13, 14, 15, ],
[ 21, 22, 23, 24, 25, ],// duplicate in indices: 1, 3 with row index 3
[ 31, 88, 33, 99, 35, ], // duplicate in indices: 1, 3 with row index 4
[ 21, 22, 43, 24, 45, ],// duplicate in indices: 1, 3 // delete this
[ 51, 88, 53, 99, 55, ], // duplicate in indices: 1, 3 // delete this row from result
];
function getDeduplicated(arr, arind) {
for (let i = 0; i < arr.length; i++) {
for (let j = 1 + i; j < arr.length; j++) {
if (arr[j].includes(arr[i][arind[0]]) && arr[j].includes(arr[i][arind[1]])) {
arr.splice(j, 1)
i--
} else continue
}
}
return arr
}
const deduplicated = getDeduplicated(unDeduplicated, [1, 3]);
const deduplicated2 = getDeduplicated(unDeduplicated1, [1, 3]);
console.log(deduplicated)
console.log("#####################")
console.log(deduplicated2)
Upvotes: 0
Reputation: 481
This is fairly concise. It uses nested filters. It will also work for any number of duplicates, keeping only the first one.
init = [
[ 11, 12, 13, 14, 15],
[ 21, 22, 23, 24, 25],
[ 31, 88, 33, 99, 35],
[ 41, 42, 43, 44, 45],
[ 51, 88, 53, 99, 55],
];
var deDuplicate = function(array, indices){
var res = array.filter(
(elem) => !array.some(
(el) =>
array.indexOf(el) < array.indexOf(elem) && //check that we don't discard the first dupe
el.filter((i) => indices.includes(el.indexOf(i))).every((l,index) => l === elem.filter((j) => indices.includes(elem.indexOf(j)))[index])
//check if the requested indexes are the same.
// Made a bit nasty by the fact that you can't compare arrays with ===
)
);
return(res);
}
console.log(deDuplicate(init,[1,3]));
Upvotes: 0
Reputation: 304
Idk if i understand good what you want to do but here is what i've done
list = [
[ 11, 12, 13, 14, 15, ],
[ 21, 22, 23, 24, 25, ],
[ 21, 58, 49, 57, 28, ],
[ 31, 88, 33, 88, 35, ],
[ 41, 42, 43, 44, 45, ],
[ 51, 88, 53, 88, 55, ],
[ 41, 77, 16, 29, 37, ],
];
el_list = [] // Auxiliar to save all unique numbers
res_list = list.reduce(
(_list, row) => {
// console.log(_list)
this_rows_el = [] // Auxiliar to save this row's elements
_list.push(row.reduce(
(keep_row, el) => {
// console.log(keep_row, this_rows_el, el)
if(keep_row && el_list.indexOf(el)==-1 ){
el_list.push(el)
this_rows_el.push(el)
return true
}else if(this_rows_el.indexOf(el)!=-1) return true // Bypass repeated elements in this row
else return false
}, true) ? row : null) // To get only duplicated rows (...) ? null : row )
return _list
}, []
)
console.log(res_list)
Upvotes: 0
Reputation: 17582
You can create Set out of the values in columns as you iterate over rows. You could choose to create sets only for the designated columns, e.g. 1 and 3 in your case. Then when iterating over each row you check if any of the designated columns in that row has such a value that is already in the corresponding set, and if it does you discard that row.
(On phone, cannot type actual code. And I guess code is pretty straight forward too)
Upvotes: 2