Zaz
Zaz

Reputation: 48759

Protecting email addresses from spam bots / web crawlers

How do you prevent emails being gathered from web pages by email spiders? Does mailto: linking them increase the likelihood of them being picked up? Is URL-encoding useful?

Obviously the best counter-measure is to only show email addresses to logged-in users, or to provide a contact form instead of an email address. But in terms of purely client-side solutions, what is available?

Upvotes: 18

Views: 10911

Answers (6)

LePain
LePain

Reputation: 1

As mentioned by Scott's answer, the link can be created by JavaScript directly after page load. For users it will appear as though the link was part of the HTML document.

As an alternative approach, you can perform base64 encoding on the email address to make every part of the address invisible to HTML-only crawlers.

Keep in mind that Google will execute this JavaScript code while crawling (in a headless Chromium instance) and will be able to display the email address in the search results (depending on search query).

<!doctype html>
<html>
    <head>
        <script type="text/javascript">
            onload = function(){
                var address = atob("dXNlckBkb21haW4uY29t");
                var anchor = document.createElement("a");
                anchor.innerHTML = address;
                anchor.href = "mailto:" + address;
                document.body.appendChild(anchor);
            };
        </script>
    </head>
    <body>
        <!-- Email link added here as the last element at runtime-->
    </body>
</html>

Therefore I recommend another option: Putting the email address in a SVG file and ensuring the the SVG editor converts the characters to individual paths. This way it will appear sharp to the user, also on high-density displays. It will however not be clickable or copy-pastable by the user, so it will work best for a simple email address.

Upvotes: 0

SillySam
SillySam

Reputation: 98

If you don't need the mail link you could replace the @ symbol with [at]

Example: sam [at] example.com

Upvotes: 0

Zaz
Zaz

Reputation: 48759

Years later, I've created the following jQuery for a different website:

$(".email").each(function() {
    $(this).html( $(this).html().replace("...", "@").replace(/\.\.\./g, ".") );
    $(this).attr( "href", $(this).attr("href").replace("...", "@").replace(/\.\.\./g, ".") );
});

Emails are then written as:

<a href="mailto:bob.smith...example...com" class="email">bob.smith...example...com</a>

It's not perfect, but it's very simple and seems to thwart most email harvesters. The advantage to this method is that someone not using JavaScript will probably be able to work out what the actual email address is.


Check out this study on various email obfuscation methods.

Upvotes: 6

Dagg Nabbit
Dagg Nabbit

Reputation: 76736

Most email spiders don't have javascript interpreters, so if you really need the mailto: you can inject it with javascript... just make sure the address is obscured in the javascript somehow, e.g.

myLink.href='mai'+'lto:'+'bob'
           +'@'
           +'example.com';

If you need to display the email address on the page, a common solution is to generate an image using something like php's gd (although the javascript injection should work ok for this too).

The idea is to remove the email addresses from the HTML and inject them with javascript. That way the email address doesn't appear in its original form in any of the HTTP traffic, which is what the spider is looking at.

Upvotes: 9

Scott
Scott

Reputation: 1872

I usually split them up into separate parts and then re-combine them using javascript. The final javascript does a document.write to write out the html.

i.e.

var mail = "mailto";
var namepart = "test.user";
var domainpart = "example";
var tld = "com"; 
var address = namepart + "@" + domainpart + "." + tld;
document.write("<a href=" + mail + ":" + address + '">' + address + "</a>";

Upvotes: 1

James Black
James Black

Reputation: 41858

I tend to avoid a mailto as it makes it too easy for people to harvest email addresses.

If you are going to have contact pages on your website, then just have a form, and when they submit your server-side code and use the appropriate email address.

If you need to be able to have other people's addresses available, use numbers, names, usernames, some way to identify them.

If you just have an email address in a span it is likely to be picked up, even if you try to hide it, as these programs can be pretty complex, as finding email addresses is what they are about.

As with most secrets, if you don't want others to get them, don't put them on the page.

Upvotes: 4

Related Questions