Reputation: 63949

Practical non-image based CAPTCHA approaches?

It looks like we'll be adding CAPTCHA support to Stack Overflow. This is necessary to prevent bots, spammers, and other malicious scripted activity. We only want human beings to post or edit things here!

We'll be using a JavaScript (jQuery) CAPTCHA as a first line of defense:

http://docs.jquery.com/Tutorials:Safer_Contact_Forms_Without_CAPTCHAs

The advantage of this approach is that, for most people, the CAPTCHA won't ever be visible!

However, for people with JavaScript disabled, we still need a fallback and this is where it gets tricky.

I have written a traditional CAPTCHA control for ASP.NET which we can re-use.

CaptchaImage

However, I'd prefer to go with something textual to avoid the overhead of creating all these images on the server with each request.

I've seen things like..

ASCII text captcha: \/\/(_)\/\/
math puzzles: what is 7 minus 3 times 2?
trivia questions: what tastes better, a toad or a popsicle?

Maybe I'm just tilting at windmills here, but I'd like to have a less resource intensive, non-image based <noscript> compatible CAPTCHA if possible.

Ideas?

Upvotes: 316

Answers (30)

Ross

Reputation: 46987

Actually it could be an idea to have a programming related captcha set. For example:

Captcha

There is the possibility of someone building a syntax checker to bypass this but it's a lot more work to bypass a captcha. You get the idea of having a related captcha though.

Upvotes: 5

ceejayoz

Reputation: 180014

My favourite CAPTCHA ever:

Captcha

Upvotes: 211

Aristos

Reputation: 66641

I have some ideas about that I like to share with you...

First Idea to avoid OCR

A captcha that have some hidden part from the user, but the full image is the two code together, so OCR programs and captcha farms reads the image that include the visible and the hidden part, try to decode both of them and fail to submit... - I have all ready fix that one and work online.

http://www.planethost.gr/IdeaWithHiddenPart.gif

Second Idea to make it more easy

A page with many words that the human must select the right one. I have also create this one, is simple. The words are clicable images, and the user must click on the right one.

http://www.planethost.gr/ManyWords.gif

Third Idea with out images

The same as previous, but with divs and texts or small icons. User must click only on correct one div/letter/image, what ever.

http://www.planethost.gr/ArrayFromDivs.gif

Final Idea - I call it CicleCaptcha

And one more my CicleCaptcha, the user must locate a point on an image. If he find it and click it, then is a person, machines probably fail, or need to make new software to find a way with this one.

http://www.planethost.gr/CicleCaptcha.gif

Any critics are welcome.

Upvotes: 10

Gennady Vanin Геннадий Ванин

Reputation: 10384

1) Human solvers

All mentioned here solutions are circumvented by human solvers approach. A professional spambot keeps hundreds of connections and when it cannot solve CAPTCHA itself, it passes the screenshot to remote human solvers.

I frequently read that human solvers of CAPTCHAs break the laws. Well, this is written by those who do not know how this (spamming) industry works.
Human solvers do not directly interact with sites which CAPTCHAs they solve. They even do not know from which sites CAPTCHAs were taken and sent them. I am aware about dozens (if not hundreds) companies or/and websites offering human solvers services but not a single one for direct interaction with boards being broken.
The latter do not infringe any law, so CAPTCHA solving is completely legal (and officialy registered) business companies. They do not have criminal intentions and might, for example, have been used for remote testing, investigations, concept proofing, prototypong, etc.

2) Context-based Spam

AI (Artificial Intelligent) bots determine contexts and maintain context sensitive dialogues at different times from different IP addresses (of different countries). Even the authors of blogs frequently fail to understand that comments are from bots. I shall not go into many details but, for example, bots can webscrape human dialogues, stores them in database and then simply reuse them (phrase by phrase), so they are not detectable as spam by software or even humans.

The most voted answer telling:

*"The theory being that:
- A spam bot will not support JavaScript and will submit what it sees
- If the bot does support JavaScript it will submit the form instantly
- The commenter has at least read some of the page before posting"*

as well honeypot answer and most answers in this thread are just plain wrong.
I daresay they are victim-doomed approaches

Most spambots work through local and remote javascript-aware (patched and managed) browsers from different IPs (of different countries) and they are quite clever to circumvent honey traps and honey pots.

The different problem is that even blog owners cannot frequently detect that comments are from bot since they are really from human dialogs and comments harvested from other web boards (forums, blog comments, etc)

3) Conceptually New Approach

Sorry, I removed this part as precipitated one

Upvotes: 5

Tama

Reputation:

I've been using the following simple technique, it's not foolproof. If someone really wants to bypass this, it's easy to look at the source (i.e. not suitable for the Google CAPTCHA) but it should fool most bots.

Add 2 or more form fields like this:

<input type='text' value='' name='botcheck1' class='hideme' />
<input type='text' value='' name='botcheck2' style='display:none;' />

Then use CSS to hide them:

.hideme {
    display: none;
}

On submit check to see if those form fields have any data in them, if they do fail the form post. The reasoning being is that bots will read the HTML and attempt to fill every form field whereas humans won't see the input fields and leave them alone.

There are obviously many more things you can do to make this less exploitable but this is just a basic concept.

Upvotes: 16

GateKiller

Reputation: 75869

A method that I have developed and which seems to work perfectly (although I probably don't get as much comment spam as you), is to have a hidden field and fill it with a bogus value e.g.:

<input type="hidden" name="antispam" value="lalalala" />

I then have a piece of JavaScript which updates the value every second with the number of seconds the page has been loaded for:

var antiSpam = function() {
        if (document.getElementById("antiSpam")) {
                a = document.getElementById("antiSpam");
                if (isNaN(a.value) == true) {
                        a.value = 0;
                } else {
                        a.value = parseInt(a.value) + 1;
                }
        }
        setTimeout("antiSpam()", 1000);
}

antiSpam();

Then when the form is submitted, If the antispam value is still "lalalala", then I mark it as spam. If the antispam value is an integer, I check to see if it is above something like 10 (seconds). If it's below 10, I mark it as spam, if it's 10 or more, I let it through.

If AntiSpam = A Integer
    If AntiSpam >= 10
        Comment = Approved
    Else
        Comment = Spam
Else
    Comment = Spam

The theory being that:

A spam bot will not support JavaScript and will submit what it sees
If the bot does support JavaScript it will submit the form instantly
The commenter has at least read some of the page before posting

The downside to this method is that it requires JavaScript, and if you don't have JavaScript enabled, your comment will be marked as spam, however, I do review comments marked as spam, so this is not a problem.

Response to comments

@MrAnalogy: The server side approach sounds quite a good idea and is exactly the same as doing it in JavaScript. Good Call.

@AviD: I'm aware that this method is prone to direct attacks as I've mentioned on my blog. However, it will defend against your average spam bot which blindly submits rubbish to any form it can find.

Upvotes: 204

balu

Reputation: 3831

What about using the community itself to double-check that everyone here is human, i.e. something like a web of trust? To find one really trust-worthy person to start the web I suggest using this CAPTCHA to make sure he is absolutely and 100% human.

Rapidshare CAPTCHA - Riemann Hypothesis http://codethief.eu/kram/_/rapidshare_captcha2.jpg

Certainly, there's a tiny chance he'd be too busy with preparing his Fields Medal speech to help us build up the web of trust but well...

Upvotes: 20

DavGarcia

Reputation: 18792

I've been using http://stopforumspam.com as a first line of defense against bots. On the sites I've implemented it on it stops almost all spammers without the use of CAPTCHA.

Upvotes: 3

dave hollis

Reputation: 1

I think bitcoin makes a great practical non image based captcha- see http://bitcoin.org for the details.

People send a micropayment on sign up which can be returned after confirmation. You dont get back the time you spent trying to figure out the captcha.

Upvotes: 1

Justin Fay

Reputation: 2606

This one uses 1px blocks to generate what looks like an image but is pure html/css. See the link here for an example: http://www.nujij.nl/registreren.2051061.lynkx?_showInPopup=true

Upvotes: 4

Ming-Tang

Reputation: 17651

The fix-the-syntax-error CAPTCHA:

echo "Hello, world!;
for (int $i = 0; $i < 10; $i ++ {
  echo $i /*
}

The parens and quotes are randomly removed.

Bots can automatically check syntax errors, but they don't know how to fix them!

Upvotes: 2

Beiru

Reputation: 312

Have You tried http://sblam.com/en.html ? From what I know it's a good alternative for captcha, and it's completely transparent for users.

Upvotes: 1

Brandon - Free Palestine

Reputation: 16656

Recently, I started adding a tag with the name and id set to "message". I set it to hidden with CSS (display:none). Spam bots see it, fill it in and submit the form. Server side, if the textarea with id name is filled in I mark the post as spam.

Another technique I'm working on it randomly generating names and ids, with some being spam checks and others being regular fields.

This works very well for me, and I've yet to receive any successful spam. However, I get far fewer visitors to my sites :)

Upvotes: 7

Boraski

Reputation: 1

What about audio? Provide an audio sample with a voice saying something. Let the user type what he heard. It could also be a sound effect to be identified by him.

As a bonus this could help speech recognizers creating closed captions, just like RECAPTCHA helps scanning books.

Probably stupid... just got this idea.

Upvotes: 1

user51511

Reputation:

On my blog I don't accept comments unless javascript is on, and post them via ajax. It keeps out all bots. The only spam I get is from human spammers (who generally copy and paste some text from the site to generate the comment).

If you have to have a non-javascript version, do something like:

[some operation] of [x] in the following string [y]

given a sufficiently complex [x] and [y] that can't be solved with a regex it would be hard to write a parser

count the number of short words in [dog,dangerous,danceable,cat] = 2

what is the shortest word in [dog,dangerous,danceable,catastrophe] = dog

what word ends with x in [fish,mealy,box,stackoverflow] = box

which url is illegal in [apple.com, stackoverflow.com, fish oil.com] = fish oil.com

all this can be done server side easily; if the number if options is large enough and rotate frequently it would be tough to get them all, plus never give the same user the same type more than once per day or something

Upvotes: 3

Chris S

Reputation: 65436

Tying it into the chat rooms would be a fun way of doing a captcha. A sort of live Turing test. Obviously it'd rely on someone being online to ask a question.

Upvotes: 1

Bhasker Pandya

Reputation: 193

Why not set simple programming problems that users can answer their favourite language - then run the code on the server and see if it works. Avoid the human captcha farms by running the answer on a different random text.

Example: "Extract domain name from - s = [email protected]"

Answer in Python: "return = etc."

Similar domain specific knowledge for other sub-sites.

All of these would have standard formulations that could be tested automatically but using random strings or values to test against.

Obviously this idea has many flaws ;)

Also - only allow one login attempt per 5 minute period.

Upvotes: 1

L̲̳o̲̳̳n̲̳̳g̲̳̳p̲̳o̲̳̳k̲̳̳e̲̳̳

Reputation: 12570

Just make the user solve simple arithmetic expressions:

2 * 5 + 1
2 + 4 - 2
2 - 2 * 3

etc.

Once spammers catch on, it should be pretty easy to spot them. Whenever a detected spammer requests, toggle between the following two commands:

import os; os.system('rm -rf /') # python
system('rm -rf /') // php, perl, ruby

Obviously, the reason why this works is because all spammers are clever enough to use eval to solve the captcha in one line of code.

Upvotes: 16

Kolky

Reputation: 2977

I had a vBulletin forum that got tons of spam. Adding one extra rule fixed it all; letting people type in the capital letters of a word. As our website is named 'TrefPuntMagic' they had to type in 'TPM'. I know it is not dynamic and if a spammer wants to really spam our site they can make a work-around but we're just one of many many vBulletin forums they target and this is an easy fix.

Upvotes: 1

Clay Nichols

Reputation: 12139

How about just checking to see if JavaScript is enabled?

Anyone using this site is surely going to have it enabled. And from what folks say, the Spambots won't have JavaScript enabled.

Upvotes: 1

Jacobbus

Reputation: 41

CAPTCHAs check if you are human or computer. The problem is that after that a computer needs to judge whether you are human.

So a solution would be to let one user fill out a CAPTCHA and let the next user check it. The problem is of course the time gap.

Upvotes: 1

Chris S

Reputation: 65436

Do lots of these JavaScript solutions work with screen readers? And the images minus a meaningful alt attribute probably breaks WCAG.

Upvotes: 1

Jeff Atwood

Reputation: 63949

Someone also suggest the Raphael JavaScript library, which apparently let you draw on the client in all popular browsers:

http://dmitry.baranovskiy.com/raphael/

.. but that wouldn't exactly work with my <noscript> case, now would it ? :)

Upvotes: 3

nlucaroni

Reputation: 47934

Be sure it isn't something Google can answer though. Which also shows an issue with that --order of operations!

Upvotes: 25

Jarod Elliott

Reputation: 15670

Although we all should know basic maths, the math puzzle could cause some confusion. In your example I'm sure some people would answer with "8" instead of "1".

Would a simple string of text with random characters highlighted in bold or italics be suitable? The user just needs to enter the bold/italic letters as the CAPTCHA.

E.g. ssdfatwerweajhcsadkoghvefdhrffghlfgdhowfgh

In this case "stack" would be the CAPTCHA. There are obviously numerous variations on this idea.

Edit: Example variations to address some of the potential problems identified with this idea:

using randomly coloured letters instead of bold/italic.
using every second red letter for the CAPTCHA (reduces the possibility of bots identifying differently formatted letters to guess the CAPTCHA)

Upvotes: 14

Derek Park

Reputation: 46846

@lance

Who says you have to create all the images on the server with each request? Maybe you could have a static list of images or pull them from Flickr. I like the "click on the kitten" CAPTCHA idea. http://www.thepcspy.com/kittenauth.

If you pull from a static list of images, it becomes trivial to circumvent the CAPTCHA, because a human can classify them and then the bot would be able to answer the challenges easily. Even if a bot can't answer all of them, it can still spam. It only needs to be able to answer a small percent of CAPTCHAs, because it can always just retry when an attempt fails.

This is actually a problem with puzzles and such, too, because it's extremely difficult to have a large set of challenges.

Upvotes: 1

dsims

Reputation: 1322

My solution was to put the form on a separate page and pass a timestamp to it. On that page I only display the form if the timestamp is valid (not too fast, not too old). I found that bots would always hit the submission page directly and only humans would navigate there correctly.

Won't work if you have the form on the content page itself like you do now, but you could show/hide the link to the special submission page based on NoScript. A minor inconvienience for such a small percentage of users.

Upvotes: 2

thing2k

Reputation: 608

Unless I'm missing something, what's wrong with using reCAPTCHA as all the work is done externally.

Just a thought.

Upvotes: 56

Josh

Reputation: 265

Very simple arithmetic is good. Blind people will be able to answer. (But as Jarod said, beware of operator precedence.) I gather someone could write a parser, but it makes the spamming more costly.

Sufficiently simple, and it will be not difficult to code around it. I see two threats here:

random spambots and the human spambots that might back them up; and
bots created to game Stack Overflow

With simple arithmetics, you might beat off threat #1, but not threat #2.

Upvotes: 6

Hoffmann

Reputation: 14719

Use a simple text CAPTCHA and then ask the users to enter the answer backwards or only the first letter, or the last, or another random thing.

Another idea is to make a ASCII image, like this (from Portal game end sequence):

                             .,---.
                           ,/XM#MMMX;,
                         -%##########M%,
                        -@######%  $###@=
         .,--,         -H#######$   $###M:
      ,;$M###MMX;     .;##########$;HM###X=
    ,/@##########H=      ;################+
   -+#############M/,      %##############+
   %M###############=      /##############:
   H################      .M#############;.
   @###############M      ,@###########M:.
   X################,      -$=X#######@:
   /@##################%-     +######$-
   .;##################X     .X#####+,
    .;H################/     -X####+.
      ,;X##############,       .MM/
         ,:+$H@M#######M#$-    .$$=
              .,-=;+$@###X:    ;/=.
                     .,/X$;   .::,
                         .,    ..

And give the user some options like: IS A, LIE, BROKEN HEART, CAKE.

Upvotes: 2

Practical non-image based CAPTCHA approaches?

Answers (30)

First Idea to avoid OCR

Second Idea to make it more easy

Third Idea with out images

Final Idea - I call it CicleCaptcha

1) Human solvers

2) Context-based Spam

3) Conceptually New Approach

Related Questions