user49438
user49438

Reputation: 909

Do any common email clients pre-fetch links rather than images?

Although I know a lot of email clients will pre-fetch or otherwise cache images. I am unaware of any that pre-fetch regular links like <a href="somelinkhere">some link</a>

Is this a practice done by some emails? If it is, is there a sort of no-follow type of rel attribute that can be added to the link to help prevent this?

Upvotes: 15

Views: 9071

Answers (4)

ku1ik
ku1ik

Reputation: 1938

As of Feb 2017 Outlook (https://outlook.live.com/) scans emails arriving in your inbox and it sends all found URLs to Bing, to be indexed by Bing crawler.

This effectively makes all one-time use links like login/pass-reset/etc useless.

(Users of my service were complaining that one-time login links don't work for some of them and it appeared that BingPreview/1.0b is hitting the URL before the user even opens the inbox)

Drupal seems to be experiencing the same problem: https://www.drupal.org/node/2828034

Upvotes: 15

Robert Paulsen
Robert Paulsen

Reputation: 5151

You won't find any native email clients that do that, but you could come across some "web accelerators" that, when using a web-based email, could try to pre-fetch links. I've never seen anything to prevent it.

Links (GETs) aren't supposed to "do" anything, only a POST is. For example, your "unsubscribe me" link in your email should not directly unsubscribe th subscriber. It should "GET" a page the subscriber can then post from.

W3 does a good job of how you should expect a GET to work (caching, etc.)

http://www.w3schools.com/tags/ref_httpmethods.asp

Upvotes: 1

Byren Higgin
Byren Higgin

Reputation: 562

All Common email clients do not have crawlers to search or pre-build <a> tag related documents if that is what you're asking, as trying to pre-build and cache a web location could be an immense task if the page is dynamic or of large enough size.

Images are stored locally to reduce load time of the email which is a convenience factor and network load reduction, but when you open an email hyperlink it will load it in your web browser rather than email client.

I just ran a test using analytics to report any server traffic, and an email containing just

<a href="linktomysite">linktomysite</a>

did not throw any resulting crawls to the site from outlook07, outlook10, thunderbird, or apple mail(yosemite). You could try using a wireshark scan to check for network traffic from the client to specific outgoing IP's if you're really interested

Upvotes: 1

C3roe
C3roe

Reputation: 96383

Although I know a lot of email clients will pre-fetch or otherwise cache images.

That is not even a given already.

Many email clients – be they web-based, or standalone applications – have privacy controls that prevent images from being automatically loaded, to prevent tracking of who read a (specific) email.

On the other hand, there’s clients like f.e. gmail’s web interface, that tries to establish the standard of downloading all referenced external images, presumably to mitigate/invalidate such attempts at user tracking – if a large majority of gmail users have those images downloaded automatically, whether they actually opened the email or not, the data that can be gained for analytical purposes becomes watered down.

I am unaware of any that pre-fetch regular links like some link

Let’s stay on gmail for example purposes, but others will behave similarly: Since Google is always interested in “what’s out there on the web”, it is highly likely that their crawlers will follow that link to see what it contains/leads to – for their own indexing purposes.

If it is, is there a sort of no-follow type of rel attribute that can be added to the link to help prevent this?

rel=no-follow concerns ranking rather than crawling, and a no-index (either in robots.txt or via meta element/rel attribute) also won’t keep nosy bots from at least requesting the URL.

Plus, other clients involved – such as a firewall/anti-virus/anti-madware – might also request it for analytical purposes without any user actively triggering it.


If you want to be (relatively) sure that any action is triggered only by a (specific) human user, then use URLs in emails or other kind of messages over the internet only to lead them to a website where they confirm an action to be taken via a form, using method=POST; whether some kind of authentication or CSRF protection might also be needed, might go a little beyond the context of this question.

Upvotes: 11

Related Questions