Reputation: 2465
The GreaseSpot page on metadata blocks says that the two are very similar but @match
"sets more strict rules on what the *
character means." GreaseSpot then proceeds to teach using @include
, but Chrome examples like this generally seem to use @match
and indicate that @include
is only supported for compatibility purposes; @match
is preferred.
Apparently, @include google.*
can run on google.evil.com while @match google.*
cannot.
That one example is not sufficient to really see how the wildcards behave differently between these two, and better explanations are sought in answers here.
New GreaseMonkey scripts (Firefox) use @include
by default while new TamperMonkey scripts (for e.g. Chrome) use @match
by default.
What exactly are the differences between these two?
For example, how does each one handle wildcards?
Are there differences in cross-browser compatibility?
What reasons would someone have for choosing to use one over the other?
Upvotes: 66
Views: 31735
Reputation: 30
In userscripts, @include
and @match
are both metadata directives used to specify the URLs of web pages where the userscript should run. However, they have some differences in how they match URLs.
@include
specifies a list of URLs where the userscript should run.
You can use wildcards like *
to match multiple URLs or parts of URLs.
It is more flexible and allows for pattern matching.
@match
specifies a single URL pattern where the userscript should run.
It is less flexible than @include
because it doesn't support wildcards or pattern matching.
It's typically used for exact URL matches.
Upvotes: -2
Reputation: 3087
The most important difference is that @match
is much more rigidly structured and restrictive than @include
, which makes it the more "generally" secure (and preferred) variant. @match
can be a little more complicated to use overall due to this rigidity, but @include
may generate scarier warnings to the end user because it's easier to misuse.
The practical usage of the two can vary widely; the full breakdown of usage for each follows below.
@include
(and @exclude
)@include
might be the directive most people are more familiar with (along with its opposing twin, @exclude
, which has exactly the same syntax features). This is the more powerful and flexible directive compared to @match
, largely because it can handle RegEx patterns. Its usage is also the most straightforward.
You can specify @include
patterns in two ways/ "modes":
In "glob mode", asterisks *
can be used as a wildcard glob to signify that any amount of characters, including zero, are allowed in a given spot in the pattern. Via the GreaseMonkey docs:
For example:
http://www.example.com/foo/*
will match:
http://www.example.com/foo/bar
and,http://www.example.com/foo/
but not:
http://www.example.com/baz/
.
There's also a special pattern available just for @include
that will match any top-level domain suffix: .tld
. A pattern like @include https://www.example.tld/*
will match the given domain with any valid, public TLD suffix, such as .com
, .org
, or .co.uk
.
@include
directives that start with a forward slash /
will be interpreted as a regular expression, with all standard JavaScript RegEx features available:
// ==UserScript==
// @include /^https?://www\.example\.com/.*$/
// @include /^http://www\.example\.(?:org|net)//
// ==/UserScript==
A few notes:
/
are not required to be escaped inside expressions.@include
patterns are always treated as case-insensitive.$
will implicitly allow trailing characters on matches.
.*
.@include /^https?://www\.google\.com/search/
will match https://www.google.com/search?q=stackoverflow
.Keep in mind that the powerful & wide-encompassing nature of @include
means that a browser cannot guarantee the target of a given script as well as it can with @match
. This means that scripts using @include
may trigger severe-sounding warnings for the user in some cases.
One of the most cited dangers of using @include
is unintentional (or maliciously intentional) URL matching; this can occur when @include
patterns aren't scoped or understood properly, or when a bad actor crafts a URL to specifically trigger a script where it isn't intended to run.
Since non-RegEx wildcards can match any characters, anywhere in a URL, seemingly simple patterns can have unexpected matches. For example, one might expect *://example.net/*
to only match URLs belonging to the example.net
domain, but it will also match https://evil.com/?http://example.net/
!
Some userscript managers have built-in protections to help mitigate attack vectors like these, but the possibility still exists, which makes @include
potentially more dangerous than @match
, which is designed to be largely immune to this style of attack.
@match
The @match
directive is a creation of Google for Chrome, designed to be a safer, more sandboxed version of the @include
directive, with much more rigidity built-in.
Instead of allowing globs or RegEx, @match
interprets a pattern as 3 parts: the scheme, the host, and the path. Google's documentation describes the basic syntax this way:
<url-pattern> := <scheme>://<host><path>
<scheme> := '*' | 'http' | 'https' | 'file' | 'ftp' | 'urn'
<host> := '*' | '*.' <any char except '/' and '*'>+
<path> := '/' <any chars>
Each part of the pattern carries its own caveats, and also interprets wildcards *
differently.
The scheme portion of the URL pattern must either exactly match a scheme supported by the browser or be the wildcard *
. Note, however, that the wildcard does not allow all schemes, but instead matches just http
and https
.
Browser | Schemes Supported in Match Patterns |
---|---|
Chrome | http , https , file , ftp , or urn |
Firefox | http , https , file , ftp , ws , wss , data , or (chrome- )extension |
Safari | At least1 http and https |
http
and https
, but they lack a comprehensive list for other browsers (e.g. urn
is missing, which Chrome supports), so Safari may still support other schemes.A caveat to the wildcard here is that in Firefox specifically (and potentially others, but notably not Chrome or Safari), the wildcard will also match WebSocket schemes ws
and wss
.
The host portion of the URL pattern can come in three styles:
www.stackoverflow.com
*.stackoverflow.com
*
The top-level domain suffix cannot be a wildcard (e.g. www.stackoverflow.*
); this is disallowed for security reasons. In order to match multiple TLD suffixes, a script will need to include a specific @match
directive for each.
The path portion of the URL pattern is the most permissive, as the only rule is that it must start with a forward slash /
. The rest can be any combination of characters and wildcards.
In this section, wildcards *
act as a standard glob operator, simply matching 0 or more characters.
The value that gets matched against the path portion of the pattern is officially the URL path plus the URL query string (eg. In google.com/search?q=test
, the query string is q=test
), including the ?
between. This is a potential pitfall for patterns that aim to match the end of a given domain, since they may be foiled by an added query string.
Also note that the path does not include URL fragments (the part of the URL at the end that follows a hash #
, e.g. www.example.com#main
); @match
directives ignore URL fragments by design to prevent abuse of unintentional matches.
It's fairly obvious, but it bears repeating that scripts should be careful to @include
exactly and exclusively the URLs that the script is intended to be run on. Runaway scripts can range from minor annoyances to major problems; always double check that scripts are running only where they're supposed to be, and use @exclude
to add guardrails if necessary or convenient.
Upvotes: 24
Reputation: 2465
You cannot use regular expressions with @match
, while you can with @include
.
However, @include
will give your users scarier security warnings about the script applying to all sites.
This is even though an @include
expression permits you to be more restrictive about the sites a script applies to (e.g. specifying that part of a URL be numeric using the regex fragment [0-9]+
, or using ^https?://
to apply to a script just those two schemes, instead of the more general non-regex globbing operator *
used for each of those cases in @match
, which causes the script to apply more broadly).
Upvotes: 39