RNA
RNA

Reputation: 1091

javascript - store regEx in a set object?

I made a simple crawler using simplecrawler :D

Its constructor has a set object which hold visited URLs:

this.visited = new Set();

Any invalid URL will be added there:

this.visited.add(url);

Currently, when new url is added in the queue I check if it is visited:

if (this.visited.has(newURL))

Can I have regEx in this set object to block url from specific site to be used as below?

// to block www.xxx.com/123, www.xxx.com/456, www.xxx.com/789
this.visited.add('/www\.xxx\.com\/\d/g');

if (this.visited.has(givenURL))
  // do not visit
else
  // visit

If this can be done, what would be the best way to get this done?

Upvotes: 0

Views: 459

Answers (1)

Kokogino
Kokogino

Reputation: 1066

You could loop over the Set and check if a URL matches the item in the set:

this.visited = new Set();
var BreakException = {};
this.visited.add('www\\.xxx\\.com/\\d+');
this.visited.add('www.xxx.com/123')
try {
    this.visited.forEach(function(x) {
        if ('www.xxx.com/123'.match(new RegExp(x))) {
            var visited = true;
            throw BreakException;
        }
    });
} catch (e) {
    // do not visit
}
if (visited) {
    // visit
}

Pay attention on the URL I added to the set. The one you used in the question wouldn't work.

You have to throw an exception to break the loop since Array.forEach doesn't support break;.

Upvotes: 1

Related Questions