Raphael Rafatpanah
Raphael Rafatpanah

Reputation: 19987

Get the intersection of n arrays

Using ES6's Set, given two arrays we can get the intersection like so:

let a = new Set([1,2,3])
let b = new Set([1,2,4])
let intersect = new Set([...a].filter(i => b.has(i)));

How can we get the intersection of n arrays?

Update:

I'm trying to wrap my head around this for the following use case. I have a two dimensional array with at least one element.

parts.forEach(part => {
  intersection = new Set()
})

How would you get the intersection of each element (array) in parts?

Upvotes: 4

Views: 3542

Answers (5)

lovasoa
lovasoa

Reputation: 6855

The most efficient algorithm for intersecting n arrays is the one implemented in fast_array_intersect. It runs in O(n), where n is the total number of elements in all the arrays.

The base principle is simple: iterate over all the arrays, storing the number of times you see each element in a map. Then filter the smallest array, to return only the elements that have been seen in all the arrays. (source code).

You can use the library with a simple :

import intersect from 'fast_array_intersect'

intersect([[1,2,3], [1,2,6]]) // --> [1,2]

Upvotes: 2

user1726343
user1726343

Reputation:

Assuming you have some function function intersect(set1, set2) {...} that can intersect two sets, you can get the intersection of an array of sets using reduce:

function intersect(a, b) {
    return new Set(a.filter(i => b.has(i)));
}

var sets = [new Set([1,2,3]), ...];
var intersection = sets.reduce(intersect);

Upvotes: 8

Redu
Redu

Reputation: 26181

OK i guess the most efficient way of performing the Array intersection is by utilizing a Map or Hash object. Here I test 1000 arrays each with ~1000 random integer items among 1..175 for an intersection. The result is obtained in less than 100msec.

function setIntersection(a){
  var m = new Map(),
      r = new Set(),
      l = a.length;
  a.forEach(sa => new Set(sa).forEach(n => m.has(n) ? m.set(n,m.get(n)+1)
                                                    : m.set(n,1)));
  m.forEach((v,k) => v === l && r.add(k));
  return r;
}

var testSets = Array(1000).fill().map(_ => Array(1000).fill().map(_ => ~~(Math.random()*175+1)));
console.time("int");
result = setIntersection(testSets);
console.timeEnd("int");
console.log(JSON.stringify([...result]));

Upvotes: 1

Blindman67
Blindman67

Reputation: 54079

ES6 still has a while

This is the type of function that can easily cause long lags due to excessive amounts of processing. This is more true with the unquestioning and even preferential use of ES6 and array methods like reduce, filter etc, over simple old fashioned loops like while and for.

When calculating the intersection of many sets the amount of work done per iteration should go down if an item has been found not to be part of the intersection. Because forEach can not break you are forced to still iterate all elements. Adding some code to avoid doing the search if the current item has been found to not belong can improve the performance, but it is a real kludge.

The is also the tendency to just create whole new datasets just to remove a single item from an array, set, or map. This is a very bad habit that i see more and more of as people adopt the ES5 way.

Get the intersection of n sets.

So to the problem at hand. Find the intersection of many sets.

Solution B

A typical ES6 solution

function intersectB(firstSet, ...sets) {
    // function to intercept two sets
    var intersect = (a,b) => {
        return new Set([...a].filter(item => b.has(item)))
    };

    // iterate all sets comparing the first set to each.
    sets.forEach(sItem => firstSet = intersect(firstSet, sItem));

    // return the result.
    return firstSet;
}

var sets = [new Set([1,2,3,4]), new Set([1,2,4,6,8]), new Set([1,3,4,6,8])];
var inter = intersectB(...sets);
console.log([...inter]);

Works well and for the simple test case execution time is under a millisecond. But in my book it is a memory hogging knot of inefficiency, creating arrays, and sets at every line almost and iterating whole sets when the outcome is already known.

Let's give it some more work. 100 sets, with up to 10000 items over 10 tests each with differing amount of matching items. Most of the intercepts will return empty sets.

Warning will cause page to hang up to one whole second... :(

// Create a set of numbers from 0 and < count
// With a odds for any number occurring to be odds
// return as a new set;
function createLargeSet(count,odds){
    var numbers = new Set();
    while(count-- > 0){
        if(Math.random() < odds){
            numbers.add(count);
        }
    }
    return numbers;
}
// create a array of large sets
function bigArrayOfSets(setCount,setMaxSize,odds){
    var bigSets = [];
    for(var i = 0; i < setCount; i ++){
        bigSets.push(createLargeSet(setMaxSize,odds));
    }
    return bigSets;
}
function intersectB(firstSet, ...sets) {
    var intersect = (a,b) => {
        return new Set([...a].filter(item => b.has(item)))
    };
    sets.forEach(sItem => firstSet = intersect(firstSet, sItem));
    return firstSet;
}
var testSets = [];
for(var i = 0.1; i <= 1; i += 0.1){
    testSets.push(bigArrayOfSets(100,10000,i));
}

var now = performance.now();
testSets.forEach(testDat => intersectB(...testDat));
var time = performance.now() - now;
console.log("Execution time : " + time);

Solution A

A better way, not as fancy but much more efficient.

function intersectA(firstSet,...sets) {
    var count = sets.length;
    var result = new Set(firstSet); // Only create one copy of the set
    firstSet.forEach(item => {
        var i = count;
        var allHave = true;
        while(i--){
            allHave = sets[i].has(item)
            if(!allHave) { break }  // loop only until item fails test
        }
        if(!allHave){
            result.delete(item);  // remove item from set rather than
                                  // create a whole new set
        }
    })
    return result;
}

Compare

So now let's compare both, if you are feeling lucky try and guess the performance difference, it's a good way to gage your understanding of Javascript execution.

// Create a set of numbers from 0 and < count
// With a odds for any number occurring to be odds
// return as a new set;
function createLargeSet(count,odds){
    var numbers = new Set();
    while(count-- > 0){
        if(Math.random() < odds){
            numbers.add(count);
        }
    }
    return numbers;
}
// create a array of large sets
function bigArrayOfSets(setCount,setMaxSize,odds){
    var bigSets = [];
    for(var i = 0; i < setCount; i ++){
        bigSets.push(createLargeSet(setMaxSize,odds));
    }
    return bigSets;
}
function intersectA(firstSet,...sets) {
    var count = sets.length;
    var result = new Set(firstSet); // Only create one copy of the set
    firstSet.forEach(item => {
        var i = count;
        var allHave = true;
        while(i--){
            allHave = sets[i].has(item)
            if(!allHave) { break }  // loop only until item fails test
        }
        if(!allHave){
            result.delete(item);  // remove item from set rather than
                                  // create a whole new set
        }
    })
    return result;
}

function intersectB(firstSet, ...sets) {
    var intersect = (a,b) => {
        return new Set([...a].filter(item => b.has(item)))
    };
    sets.forEach(sItem => firstSet = intersect(firstSet, sItem));
    return firstSet;
}
var testSets = [];
for(var i = 0.1; i <= 1; i += 0.1){
    testSets.push(bigArrayOfSets(100,10000,i));
}

var now = performance.now();
testSets.forEach(testDat => intersectB(...testDat));
var time = performance.now() - now;
console.log("Execution time 'intersectB' : " + time);

var now = performance.now();
testSets.forEach(testDat => intersectA(...testDat));
var time = performance.now() - now;
console.log("Execution time 'intersectA' : " + time);

As you can see using a simple while loop may not be a cool as using filter but the performance benefit is huge, and something to keep in mind next time you are writing that perfect 3 line ES6 array manipulation function. Dont forget about for and while.

Upvotes: 2

gyre
gyre

Reputation: 16787

You can create an intersect helper function using a combination of Array methods like .filter(), .map(), and .every().

This answer is inspired by the comment above from Xufox, who mentioned using Array#every in a filter predicate.

function intersect (first = [], ...rest) {
   rest = rest.map(array => new Set(array))
   return first.filter(e => rest.every(set => set.has(e)))
}

let parts = [
  [1, 2, 3],
  [1, 2, 4],
  [1, 5, 2]
]

console.log(
  intersect(...parts)
)

Upvotes: 4

Related Questions