Pete Williams
Pete Williams

Reputation: 221

JavaScript negative lookbehind issue

I've got some JavaScript that looks for Amazon ASINs within an Amazon link, for example

http://www.amazon.com/dp/B00137QS28

For this I use the following regex: /([A-Z0-9]{10})

However, I don't want it to match artist links which look like:

http://www.amazon.com/Artist-Name/e/B000AQ1JZO

So I need to exclude any links where there's a '/e' before the slash and the 10-character alphanumeric code. I thought the following would do that: (?<!/e)([A-Z0-9]{10}), but it turns out negative lookbehinds don't work in JavaScript. Is that right? Is there another way to do this instead?

Any help would be much appreciated!

As a side note, be aware there are plenty of Amazon link formats, which is why I want to blacklist rather than whitelist, eg, these are all the same page:

http://www.amazon.com/gp/product/B00137QS28/
http://www.amazon.com/dp/B00137QS28
http://www.amazon.com/exec/obidos/ASIN/B00137QS28/
http://www.amazon.com/Product-Title-Goes-Here/dp/B00137QS28/

Upvotes: 3

Views: 796

Answers (3)

Mike Samuel
Mike Samuel

Reputation: 120586

([A-Z0-9]{10}) will work equally well on the reverse of its input, so you can

  1. reverse the string,
  2. use positive lookahead,
  3. reverse it back.

Upvotes: 2

J. K.
J. K.

Reputation: 8368

You need to use a lookahead to filter the /e/* ones out. Then trim the leading /e/ from each of the matches.

var source; // the source you're matching against the RegExp
var matches = source.match(/(?!\/e)..\/[A-Z0-9]{10}/g) || [];
var ids = matches.map(function (match) {
  return match.substr(3);
});

Upvotes: 0

Qtax
Qtax

Reputation: 33928

In your case an expression like this would work:

/(?!\/e)..\/([A-Z0-9]{10})/

Upvotes: 3

Related Questions