usertest
usertest

Reputation: 27628

Detecting International Characters In Regular Expressions

Here's a regular expression to detect product pages on amazon. It works for pages in standard english but not for url's with international characters. So URL2 is not detected. How do I get around this? Thanks.

var URL1 = "www.amazon.com/Big-Short-Inside-Doomsday-Machine/dp/0393338827/";
var URL2 = "www.amazon.fr/Larm%C3%A9e-furieuse-Fred-Vargas/dp/2878583760/";

var regex1 = RegExp("http://www.amazon.(com|co.uk|de|ca|it|fr|cn|co.jp)/([\\w-]+/)?(dp|gp/product)/(\\w+/)?(\\w{10})");
m = URL1.match(regex1);

Upvotes: 1

Views: 289

Answers (1)

ikegami
ikegami

Reputation: 385590

% doesn't match \w, so Larm%C3%A9e-furieuse-Fred-Vargas doesn't match [\w-]+. Why not just use [^/]+?

PS — "." matches any character, so you should use pattern \., which would appear as \\. in the literal.

RegExp("http://www\\.amazon\\.(ca|cn|co\\.(jp|uk)|com|de|fr|it)/([^/]+/)?(dp|gp/product)/(\\w+/)?(\\w{10})");

Upvotes: 1

Related Questions