Reputation: 359906
Let A
and B
be two sets. I'm looking for really fast or elegant ways to compute the set difference (A - B
or A \B
, depending on your preference) between them. The two sets are stored and manipulated as Javascript arrays, as the title says.
Notes:
Edit: I noticed a comment about sets containing duplicate elements. When I say "set" I'm referring to the mathematical definition, which means (among other things) that they do not contain duplicate elements.
Upvotes: 186
Views: 146949
Reputation: 48640
As of June 2024, the ECMAScript TC39 proposal for set methods is in Stage 3 (Candidate).
Update: as of July 6th, 2024, the proposal is in Stage 4 (Draft).
Set.prototype.intersection(other)
Set.prototype.union(other)
Set.prototype.difference(other)
Set.prototype.symmetricDifference(other)
Set.prototype.isSubsetOf(other)
Set.prototype.isSupersetOf(other)
Set.prototype.isDisjointFrom(other)
const
a = new Set([1, 2, 3, 4]),
b = new Set([5, 4, 3, 2]);
console.log(...[...a.union(b)]); // [1, 2, 3, 4, 5]
console.log(...[...a.intersection(b)]); // [2, 3, 4]
console.log(...[...a.difference(b)]); // [1]
console.log(...[...b.difference(a)]); // [5]
console.log(...[...a.symmetricDifference(b)]); // [1, 5]
const
c = new Set(['A', 'B', 'C', 'D', 'E']),
d = new Set(['B', 'D']);
console.log(d.isSubsetOf(c)); // true
console.log(c.isSupersetOf(d)); // true
const
e = new Set(['A', 'B', 'C']),
f = new Set(['X', 'Y', 'Z']);
console.log(e.isDisjointFrom(f)); // true
.as-console-wrapper { top: 0; max-height: 100% !important; }
All major browsers now support the following methods as of June 11, 2024.
Type | Name | Version | Date |
---|---|---|---|
Desktop | Chrome | 122 | 2024-02-20 |
Desktop | Edge | 122 | 2024-02-23 |
Desktop | Firefox | 127 | 2024-06-11 |
Desktop | Opera | 108 | 2024-03-05 |
Desktop | Safari | 17 | 2023-09-18 |
Mobile | Chrome Android | 122 | 2024-02-20 |
Mobile | Firefox for Android | 127 | 2024-06-11 |
Mobile | Opera Android | 81 | 2024-03-14 |
Mobile | Safari on iOS | 17 | 2023-09-18 |
Mobile | Samsung Internet | - | - |
Mobile | WebView Android | 122 | 2024-02-20 |
Server | Deno | 1.42 | 2024-03-28 |
Server | Node.js | 22.0.0 | 2024-04-25 |
Other | TypeScript | 5.5 | 2024-06-20 |
The function below are ports of the methods found in Python's set()
class and follows the TC39 Set methods proposal.
const
union = (a, b) => new Set([...a, ...b]),
intersection = (a, b) => new Set([...a].filter(x => b.has(x))),
difference = (a, b) => new Set([...a].filter(x => !b.has(x))),
symmetricDifference = (a, b) => union(difference(a, b), difference(b, a)),
isSubsetOf = (a, b) => [...b].every(x => a.has(x)),
isSupersetOf = (a, b) => [...a].every(x => b.has(x)),
isDisjointFrom = (a, b) => !intersection(a, b).size;
const
a = new Set([1, 2, 3, 4]),
b = new Set([5, 4, 3, 2]);
console.log(...union(a, b)); // [1, 2, 3, 4, 5]
console.log(...intersection(a, b)); // [2, 3, 4]
console.log(...difference(b, a)); // [1]
console.log(...difference(a, b)); // [5]
console.log(...symmetricDifference(a, b)); // [1, 5]
const
c = new Set(['A', 'B', 'C', 'D', 'E']),
d = new Set(['B', 'D']);
console.log(isSubsetOf(c, d)); // true
console.log(isSupersetOf(d, c)); // true
const
e = new Set(['A', 'B', 'C']),
f = new Set(['X', 'Y', 'Z']);
console.log(isDisjointFrom(e, f)); // true
.as-console-wrapper { top: 0; max-height: 100% !important; }
Upvotes: 10
Reputation: 19564
Use Set.prototype.difference()
:
A.difference(B)
In theory, the time complexity should be Θ(n)
, where n
is the number of elements in B
.
You can also polyfill this into older environments with [core-js
]:
import "core-js"
Upvotes: 1
Reputation: 36
Answer provided by @koblas is good but returns items that are in both arrays aswell. With a slight modification (in ES6) for my use case where I want to get the difference, (with the intention of retrieving new items in array_j
, as well as the the items in array_i
that are not in array j
as separate output arrays, these are the 3 main ways provided to do this:
var arr_i = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"];
var arr_j = ["a", "c", "d", "f", "g", "h", "j", "k", "l", "n"];
The answers should be the new items in array j as ['b', 'e', 'i']
as well as the items in array i that are not in array j as ['k', 'l', 'n']
// Convert to Set
var set_i = new Set(arr_i);
var set_j = new Set(arr_j);
const changes = (arr1, arr2) => {
// Using Array method
let turn_on = arr2.filter((x) => !arr1.includes(x));
let turn_off = arr1.filter((x) => !arr2.includes(x));
return { turn_on, turn_off };
};
const setChanges = (set1, set2) => {
// Using Set method
let turn_on = new Set([...set2].filter((x) => !set1.has(x)));
let turn_off = new Set([...set1].filter((x) => !set2.has(x)));
return { turn_on, turn_off };
};
function* setMinus(setA, setB) {
// Using Set method with generator by @koblas
for (const v of setB.values()) {
// .delete returns true if value was already in Set; otherwise false.
if (!setA.delete(v)) {
yield v;
}
}
}
const changesGenerator = (set1, set2) => {
let turn_off = Array.from(setMinus(set2, set1));
let turn_on = Array.from(setMinus(set1, set2));
return { turn_on, turn_off };
};
All three methods return:
{ turn_on: [ 'k', 'l', 'n' ], turn_off: [ 'b', 'e', 'i' ] }
Timing these on random array including numbers from range [0,10000] containing 5000 items
let arr_i = Array.from({ length: 5000 }, () =>
Math.floor(Math.random() * 10000)
);
let arr_j = Array.from({ length: 5000 }, () =>
Math.floor(Math.random() * 10000)
);
var set_i = new Set(arr_i);
var set_j = new Set(arr_j);
console.time("Array method");
changes(arr_i, arr_j);
console.timeEnd("Array method");
console.time("Set method");
setChanges(set_i, set_j);
console.timeEnd("Set method");
console.time("Generator method");
changesGenerator(set_i, set_j);
console.timeEnd("Generator method");
Returns:
Array method: 36.894ms
Set method: 1.14ms
Generator method: 2.155ms
So yeah, just use:
let set1_minus_set2 = new Set([...set1].filter((x) => !set2.has(x)));
Upvotes: 0
Reputation: 27088
Looking at a lof of these solutions, they do fine for small cases. But, when you blow them up to a million items, the time complexity starts getting silly.
A.filter(v => B.includes(v))
That starts looking like an O(N^2) solution. Since there is an O(N) solution, let's use it, you can easily modify to not be a generator if you're not up to date on your JS runtime.
function *setMinus(A, B) {
const setA = new Set(A);
const setB = new Set(B);
for (const v of setB.values()) {
if (!setA.delete(v)) {
yield v;
}
}
for (const v of setA.values()) {
yield v;
}
}
a = [1,2,3];
b = [2,3,4];
console.log(Array.from(setMinus(a, b)));
While this is a bit more complex than many of the other solutions, when you have large lists this will be far faster.
Let's take a quick look at the performance difference, running it on a set of 1,000,000 random integers between 0...10,000 we see the following performance results.
setMinus time = 181 ms
diff time = 19099 ms
function buildList(count, range) {
result = [];
for (i = 0; i < count; i++) {
result.push(Math.floor(Math.random() * range))
}
return result;
}
function *setMinus(A, B) {
const setA = new Set(A);
const setB = new Set(B);
for (const v of setB.values()) {
if (!setA.delete(v)) {
yield v;
}
}
for (const v of setA.values()) {
yield v;
}
}
function doDiff(A, B) {
return A.filter(function(x) { return B.indexOf(x) < 0 })
}
const listA = buildList(100_000, 100_000_000);
const listB = buildList(100_000, 100_000_000);
let t0 = process.hrtime.bigint()
const _x = Array.from(setMinus(listA, listB))
let t1 = process.hrtime.bigint()
const _y = doDiff(listA, listB)
let t2 = process.hrtime.bigint()
console.log("setMinus time = ", (t1 - t0) / 1_000_000n, "ms");
console.log("diff time = ", (t2 - t1) / 1_000_000n, "ms");
Upvotes: 23
Reputation: 12412
Well, 7 years later, with ES6's Set object it's quite easy (but still not as compact as python's A - B
), and reportedly faster than indexOf
for large arrays:
console.clear();
let a = new Set([1, 2, 3, 4]);
let b = new Set([5, 4, 3, 2]);
let a_minus_b = new Set([...a].filter(x => !b.has(x)));
let b_minus_a = new Set([...b].filter(x => !a.has(x)));
let a_intersect_b = new Set([...a].filter(x => b.has(x)));
let a_union_b = new Set([...a, ...b]);
console.log(...a_minus_b); // {1}
console.log(...b_minus_a); // {5}
console.log(...a_intersect_b); // {2,3,4}
console.log(...a_union_b); // {1,2,3,4,5}
Upvotes: 187
Reputation: 53940
I don't know if this is most effective, but perhaps the shortest:
var A = [1, 2, 3, 4];
var B = [1, 3, 4, 7];
var diff = A.filter(function(x) {
return B.indexOf(x) < 0;
});
console.log(diff); // [2]
Updated to ES6:
const A = [1, 2, 3, 4];
const B = [1, 3, 4, 7];
const diff = A.filter(x => !B.includes(x));
console.log(diff); // [2]
Upvotes: 232
Reputation: 4657
If you're using Set
s, it can be quite simple and performant:
function setDifference(a, b) {
return new Set(Array.from(a).filter(item => !b.has(item)));
}
Since Set
s use Hash functions* under the hood, the has
function is much faster than indexOf
(this matters if you have, say, more than 100 items).
Upvotes: 18
Reputation: 643
As for the fasted way, this isn't so elegant but I've run some tests to be sure. Loading one array as an object is far faster to process in large quantities:
var t, a, b, c, objA;
// Fill some arrays to compare
a = Array(30000).fill(0).map(function(v,i) {
return i.toFixed();
});
b = Array(20000).fill(0).map(function(v,i) {
return (i*2).toFixed();
});
// Simple indexOf inside filter
t = Date.now();
c = b.filter(function(v) { return a.indexOf(v) < 0; });
console.log('completed indexOf in %j ms with result %j length', Date.now() - t, c.length);
// Load `a` as Object `A` first to avoid indexOf in filter
t = Date.now();
objA = {};
a.forEach(function(v) { objA[v] = true; });
c = b.filter(function(v) { return !objA[v]; });
console.log('completed Object in %j ms with result %j length', Date.now() - t, c.length);
Results:
completed indexOf in 1219 ms with result 5000 length
completed Object in 8 ms with result 5000 length
However, this works with strings only. If you plan to compare numbered sets you'll want to map results with parseFloat.
Upvotes: 4
Reputation: 22032
Some simple functions, borrowing from @milan's answer:
const setDifference = (a, b) => new Set([...a].filter(x => !b.has(x)));
const setIntersection = (a, b) => new Set([...a].filter(x => b.has(x)));
const setUnion = (a, b) => new Set([...a, ...b]);
Usage:
const a = new Set([1, 2]);
const b = new Set([2, 3]);
setDifference(a, b); // Set { 1 }
setIntersection(a, b); // Set { 2 }
setUnion(a, b); // Set { 1, 2, 3 }
Upvotes: 7
Reputation: 6620
Using Underscore.js (Library for functional JS)
>>> var foo = [1,2,3]
>>> var bar = [1,2,4]
>>> _.difference(foo, bar);
[4]
Upvotes: 6
Reputation: 132
The shortest, using jQuery, is:
var A = [1, 2, 3, 4];
var B = [1, 3, 4, 7];
var diff = $(A).not(B);
console.log(diff.toArray());
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
Upvotes: 8
Reputation: 169633
You can use an object as a map to avoid linearly scanning B
for each element of A
as in user187291's answer:
function setMinus(A, B) {
var map = {}, C = [];
for(var i = B.length; i--; )
map[B[i].toSource()] = null; // any other value would do
for(var i = A.length; i--; ) {
if(!map.hasOwnProperty(A[i].toSource()))
C.push(A[i]);
}
return C;
}
The non-standard toSource()
method is used to get unique property names; if all elements already have unique string representations (as is the case with numbers), you can speed up the code by dropping the toSource()
invocations.
Upvotes: 14
Reputation: 8999
Incorporating the idea from Christoph and assuming a couple of non-standard iteration methods on arrays and objects/hashes (each
and friends), we can get set difference, union and intersection in linear time in about 20 lines total:
var setOPs = {
minusAB : function (a, b) {
var h = {};
b.each(function (v) { h[v] = true; });
return a.filter(function (v) { return !h.hasOwnProperty(v); });
},
unionAB : function (a, b) {
var h = {}, f = function (v) { h[v] = true; };
a.each(f);
b.each(f);
return myUtils.keys(h);
},
intersectAB : function (a, b) {
var h = {};
a.each(function (v) { h[v] = 1; });
b.each(function (v) { h[v] = (h[v] || 0) + 1; });
var fnSel = function (v, count) { return count > 1; };
var fnVal = function (v, c) { return v; };
return myUtils.select(h, fnSel, fnVal);
}
};
This assumes that each
and filter
are defined for arrays, and that we have two utility methods:
myUtils.keys(hash)
: returns an
array with the keys of the hash
myUtils.select(hash, fnSelector,
fnEvaluator)
: returns an array with
the results of calling fnEvaluator
on the key/value pairs for which
fnSelector
returns true.
The select()
is loosely inspired by Common Lisp, and is merely filter()
and map()
rolled into one. (It would be better to have them defined on Object.prototype
, but doing so wrecks havoc with jQuery, so I settled for static utility methods.)
Performance: Testing with
var a = [], b = [];
for (var i = 100000; i--; ) {
if (i % 2 !== 0) a.push(i);
if (i % 3 !== 0) b.push(i);
}
gives two sets with 50,000 and 66,666 elements. With these values A-B takes about 75ms, while union and intersection are about 150ms each. (Mac Safari 4.0, using Javascript Date for timing.)
I think that's decent payoff for 20 lines of code.
Upvotes: 5
Reputation: 634
This works, but I think another one is much more shorter, and elegant too
A = [1, 'a', 'b', 12];
B = ['a', 3, 4, 'b'];
diff_set = {
ar : {},
diff : Array(),
remove_set : function(a) { ar = a; return this; },
remove: function (el) {
if(ar.indexOf(el)<0) this.diff.push(el);
}
}
A.forEach(diff_set.remove_set(B).remove,diff_set);
C = diff_set.diff;
Upvotes: 1
Reputation:
I would hash the array B, then keep values from the array A not present in B:
function getHash(array){
// Hash an array into a set of properties
//
// params:
// array - (array) (!nil) the array to hash
//
// return: (object)
// hash object with one property set to true for each value in the array
var hash = {};
for (var i=0; i<array.length; i++){
hash[ array[i] ] = true;
}
return hash;
}
function getDifference(a, b){
// compute the difference a\b
//
// params:
// a - (array) (!nil) first array as a set of values (no duplicates)
// b - (array) (!nil) second array as a set of values (no duplicates)
//
// return: (array)
// the set of values (no duplicates) in array a and not in b,
// listed in the same order as in array a.
var hash = getHash(b);
var diff = [];
for (var i=0; i<a.length; i++){
var value = a[i];
if ( !hash[value]){
diff.push(value);
}
}
return diff;
}
Upvotes: 6