Reputation: 6873
I'm using a .reduce
method to iterate thru an array of objects in order to return the array index for the object that best fits certain conditions. My array has around 30,000 indexes right now and I'm shooting for well over a million. Trouble is, iterating thru the array using .reduce
takes FOREVER!!! We're talking almost 4 seconds at the moment, imagine if the array had my projected 1 million indexes. The array is compact. I'm not connected to a database or server. Here is my code:
var startMatchMaking = function () {
var loopCounter = 0;
var length = personArray.length;
do {
var manCounter = 0;
loopCounter++;
for (var i = length; i--;){
if (!personArray[i].isSingle && personArray[i].sex === "Male" &&
personArray[i].isAvailable === true) {
manCounter++;
var num = normalRandomScaled(2.1, 12.44);
var result = personArray.reduce(function(p,c,k,a){
return c.sex !== personArray[i].sex &&
!c.isSingle && c.isAvailable === true &&
c.age <= (personArray[i].age + num) &&
c.age >= (personArray[i].age - num) ? k : p;
}, 0);
result = !personArray[result].isSingle &&
personArray[result].sex !== personArray[i].sex &&
personArray[result].age <= (personArray[i].age + num) &&
personArray[result].age >= (personArray[i].age - num) ? result : -1;
if (result >= 0) {
householdArray.push (new Household (personArray[i], personArray[result]));
personArray[result].isAvailable = false;
personArray[i].isAvailable = false;
}
}
}
document.write("<br>Mancounter is: " + manCounter +
" loopCounter is: " + loopCounter + " households: " + householdArray.length);
}
while (manCounter > 0 && loopCounter <= 5);
};
startMatchMaking();
CONTEXT: I'm trying to develop a standalone application to run an agent-based model demographics simulation. personArray
essentially contains 30,000 human beings. The particular bit of code above is related to the initial setup of the population. Person
s were previously created and pushed to the array. Each Person
object has a firstName
, lastName
, sex
, age
and isSingle
property. They were assigned random values for each. At this stage in the program, I need to take the Person
s who are meant to be not single, and match them with a suitable mate of the opposing sex and a compatible age to form households.
How can I optimize this in order to increase performance significantly? I'm open to small changes or completely different alternatives that would output the same result
.
Upvotes: 0
Views: 1561
Reputation: 1424
You use reduce
and iterate over all elements this way in a loop that also iterates over the elements as already mentioned in the comments. This leads to quadratic complexity. This means that if you double the number of persons the runtime of the algorithm is multiplied by 4. So, it is completely unfeasible to process millions of people this way.
It seems to me that the inner iteration over all elements is not necessary. You can replace reduce
by an ordinary loop and stop iterating when a match is found. The current solution takes the last found match. Is there anything that makes the last one better than the first one? Or do I miss anything? What about randomly picking some indices when searching for a match and stopping when a match is found? This is a solution that does not require much changes and I expect it to make a big difference except for very young and very old people (outliers).
A solution that requires more changes is to map people by there properties as similarly already mentioned in the comments so that you can do something like matchCandidates = people[oppositeSex][ageWithSomeRandomness]
. Have a look at this post for more information about possible implementations of maps and hashtables in Javascript.
An additional improvement might be achieved by filtering the people at the beginning so that no singles are included, i. e. copying people that are not single into a new array and only access the new array in the algorithm.
If your code runs in browsers you can use web workers to avoid that the browser freezes.
Upvotes: 2
Reputation: 4283
As others and I have stated in the comments: your approach is bruteforce, which in this case is quadratic in the input size. There are several optimisation possibilities. For binary values (i.e. booleans) splitting the array into categories is trivial. Number values like age may be clustered, e.g. into ranges. And you should definitively adopt early abort as mentioned by mm759. TLDR: there's a table and conclusion at the bottom.
Consider the bruteforce approach (for reference):
// The result is a list of matches [[candidate.id, match.id (or -1)]]
function bruteforce(arr) {
var matches = [];
for(var i = 0; i < arr.length; ++i) {
var candidate = arr[i], num = 12;
var isCandidate = !candidate.isSingle && candidate.isAvailable && candidate.sex == 1;
var cSex = candidate.sex;
var cSexPref = candidate.sexPref;
var cAgeMin = candidate.age - num;
var cAgeMax = candidate.age + num;
var result = !isCandidate ? -1 : arr.reduce(function(p,c,k,a){
return k != i &&
c.sex == cSexPref &&
c.sexPref == cSex &&
!c.isSingle && c.isAvailable &&
c.age <= cAgeMax &&
c.age >= cAgeMin ? k : p;
}, -1);
if(isCandidate)
matches.push([i, result]);
}
return matches;
}
The category approach might look like this:
function useTheCategory(arr) {
// preprocessing the data
var wmNonsingleAvailables = [];
var wwNonsingleAvailables = [];
var mwNonsingleAvailables = [];
var mmNonsingleAvailables = [];
// split the data into categories
arr.forEach(function(c) {
if(!c.isSingle && c.isAvailable) {
if(c.sex == 0) {
if(c.sexPref == 1)
wmNonsingleAvailables.push(c);
else
wwNonsingleAvailables.push(c);
} else {
if(c.sexPref == 0)
mwNonsingleAvailables.push(c);
else
mmNonsingleAvailables.push(c);
}
}
});
var matches = [];
for(var i = 0; i < arr.length; ++i) {
var candidate = arr[i], num = 12;
var isCandidate = !candidate.isSingle && candidate.isAvailable && candidate.sex == 1;
var cSex = candidate.sex;
var cSexPref = candidate.sexPref;
var cAgeMin = candidate.age - num;
var cAgeMax = candidate.age + num;
if(isCandidate) {
var category = null;
// find the relevant category (in this case)
// a more complex approach/split might include multiple categories here
if(cSex == 0) {
if(cSexPref == 1)
category = mwNonsingleAvailables;
else if(cSexPref == 0)
category = wwNonsingleAvailables;
} else if(cSex == 1) {
if(cSexPref == 0)
category = wmNonsingleAvailables;
else if(cSexPref == 1)
category = mmNonsingleAvailables;
}
var result = -1;
if(category == null) {
// always handle the error case...
console.log("logic error: missing category!");
console.log("candidate: " + JSON.stringify(candidate));
} else {
// the tests for matching sex/single/availability are left-overs and not necessarily required,
// they are left in here to show that the reduce is not the culprit of your slowdown
var match = category.reduce(function(p,c,k,a){
return c.id != i &&
c.sex == cSexPref &&
c.sexPref == cSex &&
!c.isSingle && c.isAvailable &&
c.age <= cAgeMax &&
c.age >= cAgeMin ? k : p;
}, -1);
// translate to arr index
if(match != -1)
result = category[match].id;
}
matches.push([i, result]);
}
}
return matches;
}
And the age range bucket approach might look like this:
function useAgeRange(arr) {
// preprocessing the data
var ranges = [1, 2, 3, 4, 5]; // find appropriate ranges to spread the entries evenly (analyse your data, more below...)
var ageBuckets = [];
// find the range of age values
var ageRange = arr.length == 0 ? [0, 0] : arr.reduce(function(p,c) {
var min = c.age < p[0] ? c.age : p[0];
var max = c.age > p[1] ? c.age : p[1];
return [min, max];
}, [arr[0].age, arr[0].age]);
// build the buckets (floor for nicer values)
for(var age = Math.floor(ageRange[0]), maxAge = ageRange[1], step = 0; age <= maxAge; age += step) {
// update step size
if(step == 0)
step = ranges[0];
else
step = ranges[Math.min(ranges.length - 1, ranges.indexOf(step) + 1)];
ageBuckets.push({
nextAge: age + step,
bucket: [],
});
}
function findBucketIndex(age) {
// min i with age < ageBuckets[i].nextAge
for(var i = 0, maxi = ageBuckets.length - 1; i < maxi; ++i)
if(age < ageBuckets[i + 1].nextAge)
return i;
return -1;
}
arr.forEach(function(c) {
ageBuckets[findBucketIndex(c.age)].bucket.push(c);
});
var matches = [];
for(var i = 0; i < arr.length; ++i) {
var candidate = arr[i], num = 12;
var isCandidate = !candidate.isSingle && candidate.isAvailable && candidate.sex == 1;
var cSex = candidate.sex;
var cSexPref = candidate.sexPref;
var cAgeMin = candidate.age - num;
var cAgeMax = candidate.age + num;
if(isCandidate) {
// Find range intersection with ageBuckets
var startBucket = findBucketIndex(cAgeMin);
var endBucket = findBucketIndex(cAgeMax);
if(startBucket < 0) startBucket = 0;
if(endBucket < 0) endBucket = ageBuckets.length - 1;
var result = -1;
// now only search those candidate buckets
for(var b = startBucket; b <= endBucket; ++b) {
var bucket = ageBuckets[b].bucket;
var match = bucket.reduce(function(p,c,k,a){
return c.id != i &&
c.sex == cSexPref &&
c.sexPref == cSex &&
!c.isSingle && c.isAvailable &&
c.age <= cAgeMax &&
c.age >= cAgeMin ? k : p;
}, -1);
// translate to arr index
if(match >= 0)
result = bucket[match].id;
}
matches.push([i, result]);
}
}
return matches;
}
I created a benchmark showing the improvements of the two approaches on jsfiddle. Both are effective by themselves (even including the preprocessing, values will differ across systems and browsers):
N Search space Brute force Categories Range buckets
(#matches) (relative timing values)
20000 2500 200 34 140
40000 5000 1400 180 556
80000 10000 5335 659 2582
160000 20000 17000 2450 16900
Analysing your data to find which approach is appropriate is everything: my benchmark generates an exponential distribution (ages 18-20 are 28% of the data points, 21-32 another 27%, 33-52 another 27% with 53-77 leaving about 18%). The range-approach doesn't deal well with this distribution as we can see in the timings above (that's for a fixed num = 12
years and 14 buckets) because for most of the queries an age range of 24 spans 55% of the data.
Upvotes: 1
Reputation: 6130
I think you'll need some pre-processing in order to speed this up.
For instance:
EDIT: We can optionally optimize the number of matches by sorting the set of women by age difference, so that couples with a small age difference are created first.
Below is some example code.
var personArray = [];
// create test population
for(var n = 0; n < 30000; n++) {
personArray.push({
isSingle: Math.random() < 0.5,
age: Math.round(18 + Math.random() * 80),
sex: Math.random() < 0.5 ? 'M' : 'F',
isAvailable: true
});
}
var num = 7, // instead of num = normalRandomScaled(2.1, 12.44)
sex = [ [], [] ],
curAge = -1, subset,
houseHold = [],
ts = performance.now();
// split population into men & women
personArray.forEach(function(p) {
sex[p.sex == 'M' ? 0 : 1].push(p);
});
// sort men by age
sex[0].sort(function(a, b) { return a.age - b.age; });
// iterate on men
sex[0].forEach(function(m) {
if(m.age != curAge) {
// create set of matching women for this age
subset = sex[1].filter(function(w) {
return w.isAvailable && w.isSingle && Math.abs(m.age - w.age) <= num;
});
// sort by age difference, so that women with
// a small age difference are picked first
subset.sort(function(a, b) {
return Math.abs(m.age - b.age) - Math.abs(m.age - a.age);
});
curAge = m.age;
}
if(m.isSingle && subset.length) {
// pick woman from set
var w = subset.pop();
m.isAvailable = false; // (not really necessary)
w.isAvailable = false;
houseHold.push([ m, w ]);
}
});
console.log(
'Found ' + houseHold.length + ' matches ' +
'in ' + Math.round(performance.now() - ts) + 'ms'
);
console.log(
'Random example:',
houseHold[(Math.random() * houseHold.length) | 0]
);
Upvotes: 1