Reputation: 958
I have a proxy management pool that is responsible for storing, checking, and retrieving proxies so that they can be used with web requests.
async getNextAvailableProxy() {
while(true) {
var sleepTime = global.Settings.ProxyPool.ProxySleepTimeMS;
var availableProxies = await this.data.find(PROXY_COLLECTION, {
$query: {
Enabled: true,
InUse: false,
LastUsed: { $lte: new Date(new Date() - sleepTime) }
},
$orderby: { ResponseTime: 1 }
});
if (availableProxies.length <= 0) {
var nextAvailable = await this.data.findOne(PROXY_COLLECTION, {
$query: { Enabled: true, InUse: false },
$orderby: { LastUsed: -1 }
});
if (!nextAvailable) {
await Utils.sleep(100);
console.log('No proxies available, sleeping');
continue;
}
sleepTime = sleepTime - (new Date() - nextAvailable.LastUsed)
if (sleepTime > 0)
await Utils.sleep(sleepTime);
continue;
}
var selectedProxy = availableProxies[0];
selectedProxy.InUse = true;
await this.data.save(PROXY_COLLECTION, selectedProxy);
return selectedProxy;
}
}
It is worth noting that my versions of find
and save
are wrappers around the MongoDB driver for NodeJS.
It is also worth noting that Utils.sleep()
is a promise that uses a setTimeout
to perform an async sleep.
Now, I understand that since NodeJS is single-threaded, race conditions cannot occur. However, when using multiple isolated objects querying the database rapidly, this is simply not the case.
If I have, say, five instances of object ProxyPool
and they all call getNextAvailableProxy()
within a short time of each other, they will all fetch the same proxy from the database, because one instance has already started the query before another instance has saved the InUse
flag, leaving me with n
-instances of ProxyPool
all retrieving the save exact proxy.
How can I get around this in an asynchronous manner?
Upvotes: 0
Views: 692
Reputation: 36329
Honestly, it's hard to tell why it's a problem based on your post. While collisions can happen, it should be rare enough not to matter in my opinion, unless the use of the proxy is a really long running operation (and so a given proxy is tied up a lot).
That said, I also would not lookup a proxy on every request. Instead, I'd probably have each worker fetch a pool of proxies either on startup or at intervals (maybe once an hour or something), and then internally manage (in-memory) the proxies it has available.
Your algorithm for figuring out what proxies to give a given worker can then be pretty flexible, and a lot less likely to have collisions, since each node instance is single threaded it won't allocate the same proxy twice.
The risk is that you may hit a place where a given worker has run out of proxies. That's something you'll need to handle as well, but since you will (in theory) have your workers load balanced in some fashion, if you hit that spot you're probably running out of proxies anyway and will have to issue a Too Busy
response soon.
Finally, when you do hit the DB for a list of available proxies, you should be using findAndModify() or similar to fetch and update the documents in one shot, so that as you pull one out of the DB you tell the DB it's not available, rather than waiting on processing on your web server first.
Upvotes: 1