javascriptgreasemonkeyuserscriptstampermonkey

Reputation: 2465

What is the difference between @include and @match in userscripts?

The GreaseSpot page on metadata blocks says that the two are very similar but @match "sets more strict rules on what the * character means." GreaseSpot then proceeds to teach using @include, but Chrome examples like this generally seem to use @match and indicate that @include is only supported for compatibility purposes; @match is preferred.

Apparently, @include google.* can run on google.evil.com while @match google.* cannot.
That one example is not sufficient to really see how the wildcards behave differently between these two, and better explanations are sought in answers here.

New GreaseMonkey scripts (Firefox) use @include by default while new TamperMonkey scripts (for e.g. Chrome) use @match by default.

What exactly are the differences between these two?

For example, how does each one handle wildcards?
Are there differences in cross-browser compatibility?
What reasons would someone have for choosing to use one over the other?

Upvotes: 66

Answers (3)

Oladetoun Gbemi

Reputation: 30

In userscripts, @include and @match are both metadata directives used to specify the URLs of web pages where the userscript should run. However, they have some differences in how they match URLs.

@include specifies a list of URLs where the userscript should run. You can use wildcards like * to match multiple URLs or parts of URLs. It is more flexible and allows for pattern matching.

@match specifies a single URL pattern where the userscript should run. It is less flexible than @include because it doesn't support wildcards or pattern matching. It's typically used for exact URL matches.

Upvotes: -2

zcoop98

Reputation: 3087

TL;DR: Rigidity

The most important difference is that @match is much more rigidly structured and restrictive than @include, which makes it the more "generally" secure (and preferred) variant. @match can be a little more complicated to use overall due to this rigidity, but @include may generate scarier warnings to the end user because it's easier to misuse.

The practical usage of the two can vary widely; the full breakdown of usage for each follows below.

`@include` (and `@exclude`)

@include might be the directive most people are more familiar with (along with its opposing twin, @exclude, which has exactly the same syntax features). This is the more powerful and flexible directive compared to @match, largely because it can handle RegEx patterns. Its usage is also the most straightforward.

Modes

You can specify @include patterns in two ways/ "modes":

Glob Mode

In "glob mode", asterisks * can be used as a wildcard glob to signify that any amount of characters, including zero, are allowed in a given spot in the pattern. Via the GreaseMonkey docs:

For example: http://www.example.com/foo/* will match:

http://www.example.com/foo/bar and,

http://www.example.com/foo/

but not:

http://www.example.com/baz/.

There's also a special pattern available just for @include that will match any top-level domain suffix: .tld. A pattern like @include https://www.example.tld/* will match the given domain with any valid, public TLD suffix, such as .com, .org, or .co.uk.

Regular Expression Mode

@include directives that start with a forward slash / will be interpreted as a regular expression, with all standard JavaScript RegEx features available:

// ==UserScript==
// @include     /^https?://www\.example\.com/.*$/
// @include     /^http://www\.example\.(?:org|net)//
// ==/UserScript==

A few notes:

Due to JavaScript's RegEx interpretation, forward slashes / are not required to be escaped inside expressions.
Other special characters still need to be escaped.
@include patterns are always treated as case-insensitive.
Expressions not ending with the EOL token $ will implicitly allow trailing characters on matches.
- I.e. the expression is treated as if it ended with .*.
- @include /^https?://www\.google\.com/search/ will match https://www.google.com/search?q=stackoverflow.

Warnings

Keep in mind that the powerful & wide-encompassing nature of @include means that a browser cannot guarantee the target of a given script as well as it can with @match. This means that scripts using @include may trigger severe-sounding warnings for the user in some cases.

One of the most cited dangers of using @include is unintentional (or maliciously intentional) URL matching; this can occur when @include patterns aren't scoped or understood properly, or when a bad actor crafts a URL to specifically trigger a script where it isn't intended to run.

Since non-RegEx wildcards can match any characters, anywhere in a URL, seemingly simple patterns can have unexpected matches. For example, one might expect *://example.net/* to only match URLs belonging to the example.net domain, but it will also match https://evil.com/?http://example.net/!

Some userscript managers have built-in protections to help mitigate attack vectors like these, but the possibility still exists, which makes @include potentially more dangerous than @match, which is designed to be largely immune to this style of attack.

`@match`

The @match directive is a creation of Google for Chrome, designed to be a safer, more sandboxed version of the @include directive, with much more rigidity built-in.

Instead of allowing globs or RegEx, @match interprets a pattern as 3 parts: the scheme, the host, and the path. Google's documentation describes the basic syntax this way:

<url-pattern> := <scheme>://<host><path>
<scheme> := '*' | 'http' | 'https' | 'file' | 'ftp' | 'urn'
<host> := '*' | '*.' <any char except '/' and '*'>+
<path> := '/' <any chars>

Each part of the pattern carries its own caveats, and also interprets wildcards * differently.

Scheme

The scheme portion of the URL pattern must either exactly match a scheme supported by the browser or be the wildcard *. Note, however, that the wildcard does not allow all schemes, but instead matches just http and https.

Browser	Schemes Supported in Match Patterns
Chrome	`http`, `https`, `file`, `ftp`, or `urn`
Firefox	`http`, `https`, `file`, `ftp`, `ws`, `wss`, `data`, or (`chrome-`)`extension`
Safari	At least¹ `http` and `https`

_{I can't find a comprehensive reference on what schemes Safari supports in manifests. Mozilla tracks it as missing all but http and https, but they lack a comprehensive list for other browsers (e.g. urn is missing, which Chrome supports), so Safari may still support other schemes.}

A caveat to the wildcard here is that in Firefox specifically (and potentially others, but notably not Chrome or Safari), the wildcard will also match WebSocket schemes ws and wss.

Host

The host portion of the URL pattern can come in three styles:

Fully explicit: www.stackoverflow.com
Subdomain wildcard: *.stackoverflow.com
Fully wildcard: *

The top-level domain suffix cannot be a wildcard (e.g. www.stackoverflow.*); this is disallowed for security reasons. In order to match multiple TLD suffixes, a script will need to include a specific @match directive for each.

Path

The path portion of the URL pattern is the most permissive, as the only rule is that it must start with a forward slash /. The rest can be any combination of characters and wildcards.

In this section, wildcards * act as a standard glob operator, simply matching 0 or more characters.

The value that gets matched against the path portion of the pattern is officially the URL path plus the URL query string (eg. In google.com/search?q=test, the query string is q=test), including the ? between. This is a potential pitfall for patterns that aim to match the end of a given domain, since they may be foiled by an added query string.

Also note that the path does not include URL fragments (the part of the URL at the end that follows a hash #, e.g. www.example.com#main); @match directives ignore URL fragments by design to prevent abuse of unintentional matches.

A Word of Caution

It's fairly obvious, but it bears repeating that scripts should be careful to @include exactly and exclusively the URLs that the script is intended to be run on. Runaway scripts can range from minor annoyances to major problems; always double check that scripts are running only where they're supposed to be, and use @exclude to add guardrails if necessary or convenient.

Upvotes: 24

WBT

Reputation: 2465

You cannot use regular expressions with @match, while you can with @include.

However, @include will give your users scarier security warnings about the script applying to all sites.

This is even though an @include expression permits you to be more restrictive about the sites a script applies to (e.g. specifying that part of a URL be numeric using the regex fragment [0-9]+, or using ^https?:// to apply to a script just those two schemes, instead of the more general non-regex globbing operator * used for each of those cases in @match, which causes the script to apply more broadly).

Upvotes: 39