Reputation: 10049
When it comes to Regex I am dumber than a door nail, so when making a Firefox extension I asked a friend for help and he gave me this:
if( doc.location.href.match(/(www\.google.*?[?&]q=[^&]+)/i) )
but the AMO editor rejected it saying it was too broad (for instance it would match http://uptime.netcraft.com/up/graph?site=www.google.com), can someone help me / give me a regex that matches the following (according to the editor it should match basically this) : http(s)://www.google.tld/q=*
So for example it should match http or https (normal and secure) as well as any tld after Google (like .ru, .se, .fr, .in etc)
In other words it should only match Google search.
Thanks in advance for your help!
/Ryan
Upvotes: 1
Views: 113
Reputation: 48416
Try
/^https?:\/\/(?:www\.)?google(?:\.[a-z]{2,3}){1,2}\/.*[&\?]q=[^&]+?/i
The (?:\.[a-z]{2,3}){1,2}
is to match like .com.au
, .co.uk
etc.
Upvotes: 2
Reputation: 120516
Don't try to tailor a regular expression. It will be unmaintainable -- if you can't find the problem with it today, what hope does the maintainer have to find a problem with it tomorrow?
Parse the URL properly, perhaps by using the regular expression which won't need to be maintained because the core URL syntax doesn't change.
From RFC 3986:
The following line is the regular expression for breaking-down a well-formed URI reference into its components.
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? 12 3 4 5 6 7 8 9
The numbers in the second line above are only to assist readability; they indicate the reference points for each subexpression (i.e., each paired parenthesis). We refer to the value matched for subexpression
<n>
as$<n>
. For example, matching the above expression tohttp://www.ics.uci.edu/pub/ietf/uri/#Related
results in the following subexpression matches:
$1 = http: $2 = http $3 = //www.ics.uci.edu $4 = www.ics.uci.edu $5 = /pub/ietf/uri/ $6 = <undefined> $7 = <undefined> $8 = #Related $9 = Related
Using that, you can check your URL in JavaScript by doing the following:
var match = url.match(/^(([^:/?#]+):)?(\/\/([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?$/);
if (!match) { throw new Error('not a URL'); }
var url = {
protocol: match[2],
authority: match[4], // host, port, username, password
path: match[5],
query: match[6],
fragment: match[8]
};
if (url.protocol !== 'http' && url.protocol !== 'https') {
throw new Error('bad protocol');
}
if (!/^www.google.[a-z]+$/.test(url.authority || '')) {
throw new Error('bad host');
}
if (!/[?&]q=/.test(url.query || '')) {
throw new Error('bad query');
}
It's more code, but it's much easier to debug, maintain, and as a bonus, you can tailor your explanation of why the URL is problematic.
Upvotes: 2
Reputation: 4118
var regex = /^https?:\/\/(www\.)?google\.[a-z]{2,3}\/([^/]*[\&]|[\?])q=.+$/i;
Upvotes: 1
Reputation: 41934
Something like this?
/https?:\/\/(www)?\.google\.[a-z]{2,3}\/[?&]q=.+/
Upvotes: 0
Reputation: 10090
Add ^https?:// to the front of the pattern you already have
so this is the whole pattern:
(^https?:\/\/www\.google.*?[?&]q=[^&]+)
what i like about the pattern you have: it does not assume that TLDs are two or three characters long.
Upvotes: 1
Reputation: 53482
^https?://www\.google\.[a-z]{2,3}/q=
assuming just 2-3 letters for tld would be ok. If you're using it between forward slashes (/), you'd want to escape them on this regex.
Upvotes: 2