Reputation: 63949
It looks like we'll be adding CAPTCHA support to Stack Overflow. This is necessary to prevent bots, spammers, and other malicious scripted activity. We only want human beings to post or edit things here!
We'll be using a JavaScript (jQuery) CAPTCHA as a first line of defense:
http://docs.jquery.com/Tutorials:Safer_Contact_Forms_Without_CAPTCHAs
The advantage of this approach is that, for most people, the CAPTCHA won't ever be visible!
However, for people with JavaScript disabled, we still need a fallback and this is where it gets tricky.
I have written a traditional CAPTCHA control for ASP.NET which we can re-use.
However, I'd prefer to go with something textual to avoid the overhead of creating all these images on the server with each request.
I've seen things like..
\/\/(_)\/\/
Maybe I'm just tilting at windmills here, but I'd like to have a less resource intensive, non-image based <noscript>
compatible CAPTCHA if possible.
Ideas?
Upvotes: 316
Views: 84664
Reputation: 46987
Actually it could be an idea to have a programming related captcha set. For example:
There is the possibility of someone building a syntax checker to bypass this but it's a lot more work to bypass a captcha. You get the idea of having a related captcha though.
Upvotes: 5
Reputation: 66641
I have some ideas about that I like to share with you...
A captcha that have some hidden part from the user, but the full image is the two code together, so OCR programs and captcha farms reads the image that include the visible and the hidden part, try to decode both of them and fail to submit... - I have all ready fix that one and work online.
http://www.planethost.gr/IdeaWithHiddenPart.gif
A page with many words that the human must select the right one. I have also create this one, is simple. The words are clicable images, and the user must click on the right one.
http://www.planethost.gr/ManyWords.gif
The same as previous, but with divs and texts or small icons. User must click only on correct one div/letter/image, what ever.
http://www.planethost.gr/ArrayFromDivs.gif
And one more my CicleCaptcha, the user must locate a point on an image. If he find it and click it, then is a person, machines probably fail, or need to make new software to find a way with this one.
http://www.planethost.gr/CicleCaptcha.gif
Any critics are welcome.
Upvotes: 10
Reputation: 10384
All mentioned here solutions are circumvented by human solvers approach. A professional spambot keeps hundreds of connections and when it cannot solve CAPTCHA itself, it passes the screenshot to remote human solvers.
I frequently read that human solvers of CAPTCHAs break the laws. Well, this is written by those who do not know how this (spamming) industry works.
Human solvers do not directly interact with sites which CAPTCHAs they solve. They even do not know from which sites CAPTCHAs were taken and sent them. I am aware about dozens (if not hundreds) companies or/and websites offering human solvers services but not a single one for direct interaction with boards being broken.
The latter do not infringe any law, so CAPTCHA solving is completely legal (and officialy registered) business companies. They do not have criminal intentions and might, for example, have been used for remote testing, investigations, concept proofing, prototypong, etc.
AI (Artificial Intelligent) bots determine contexts and maintain context sensitive dialogues at different times from different IP addresses (of different countries). Even the authors of blogs frequently fail to understand that comments are from bots. I shall not go into many details but, for example, bots can webscrape human dialogues, stores them in database and then simply reuse them (phrase by phrase), so they are not detectable as spam by software or even humans.
The most voted answer telling:
as well honeypot answer and most answers in this thread are just plain wrong.
I daresay they are victim-doomed approaches
Most spambots work through local and remote javascript-aware (patched and managed) browsers from different IPs (of different countries) and they are quite clever to circumvent honey traps and honey pots.
The different problem is that even blog owners cannot frequently detect that comments are from bot since they are really from human dialogs and comments harvested from other web boards (forums, blog comments, etc)
Sorry, I removed this part as precipitated one
Upvotes: 5
Reputation:
I've been using the following simple technique, it's not foolproof. If someone really wants to bypass this, it's easy to look at the source (i.e. not suitable for the Google CAPTCHA) but it should fool most bots.
Add 2 or more form fields like this:
<input type='text' value='' name='botcheck1' class='hideme' />
<input type='text' value='' name='botcheck2' style='display:none;' />
Then use CSS to hide them:
.hideme {
display: none;
}
On submit check to see if those form fields have any data in them, if they do fail the form post. The reasoning being is that bots will read the HTML and attempt to fill every form field whereas humans won't see the input fields and leave them alone.
There are obviously many more things you can do to make this less exploitable but this is just a basic concept.
Upvotes: 16
Reputation: 75869
A method that I have developed and which seems to work perfectly (although I probably don't get as much comment spam as you), is to have a hidden field and fill it with a bogus value e.g.:
<input type="hidden" name="antispam" value="lalalala" />
I then have a piece of JavaScript which updates the value every second with the number of seconds the page has been loaded for:
var antiSpam = function() {
if (document.getElementById("antiSpam")) {
a = document.getElementById("antiSpam");
if (isNaN(a.value) == true) {
a.value = 0;
} else {
a.value = parseInt(a.value) + 1;
}
}
setTimeout("antiSpam()", 1000);
}
antiSpam();
Then when the form is submitted, If the antispam value is still "lalalala", then I mark it as spam. If the antispam value is an integer, I check to see if it is above something like 10 (seconds). If it's below 10, I mark it as spam, if it's 10 or more, I let it through.
If AntiSpam = A Integer
If AntiSpam >= 10
Comment = Approved
Else
Comment = Spam
Else
Comment = Spam
The theory being that:
The downside to this method is that it requires JavaScript, and if you don't have JavaScript enabled, your comment will be marked as spam, however, I do review comments marked as spam, so this is not a problem.
Response to comments
@MrAnalogy: The server side approach sounds quite a good idea and is exactly the same as doing it in JavaScript. Good Call.
@AviD: I'm aware that this method is prone to direct attacks as I've mentioned on my blog. However, it will defend against your average spam bot which blindly submits rubbish to any form it can find.
Upvotes: 204
Reputation: 3831
What about using the community itself to double-check that everyone here is human, i.e. something like a web of trust? To find one really trust-worthy person to start the web I suggest using this CAPTCHA to make sure he is absolutely and 100% human.
Rapidshare CAPTCHA - Riemann Hypothesis http://codethief.eu/kram/_/rapidshare_captcha2.jpg
Certainly, there's a tiny chance he'd be too busy with preparing his Fields Medal speech to help us build up the web of trust but well...
Upvotes: 20
Reputation: 18792
I've been using http://stopforumspam.com as a first line of defense against bots. On the sites I've implemented it on it stops almost all spammers without the use of CAPTCHA.
Upvotes: 3
Reputation: 1
I think bitcoin makes a great practical non image based captcha- see http://bitcoin.org for the details.
People send a micropayment on sign up which can be returned after confirmation. You dont get back the time you spent trying to figure out the captcha.
Upvotes: 1
Reputation: 2606
This one uses 1px blocks to generate what looks like an image but is pure html/css. See the link here for an example: http://www.nujij.nl/registreren.2051061.lynkx?_showInPopup=true
Upvotes: 4
Reputation: 17651
The fix-the-syntax-error CAPTCHA:
echo "Hello, world!;
for (int $i = 0; $i < 10; $i ++ {
echo $i /*
}
The parens and quotes are randomly removed.
Bots can automatically check syntax errors, but they don't know how to fix them!
Upvotes: 2
Reputation: 312
Have You tried http://sblam.com/en.html ? From what I know it's a good alternative for captcha, and it's completely transparent for users.
Upvotes: 1
Reputation: 16656
Recently, I started adding a tag with the name and id set to "message". I set it to hidden with CSS (display:none). Spam bots see it, fill it in and submit the form. Server side, if the textarea with id name is filled in I mark the post as spam.
Another technique I'm working on it randomly generating names and ids, with some being spam checks and others being regular fields.
This works very well for me, and I've yet to receive any successful spam. However, I get far fewer visitors to my sites :)
Upvotes: 7
Reputation: 1
What about audio? Provide an audio sample with a voice saying something. Let the user type what he heard. It could also be a sound effect to be identified by him.
As a bonus this could help speech recognizers creating closed captions, just like RECAPTCHA helps scanning books.
Probably stupid... just got this idea.
Upvotes: 1
Reputation:
On my blog I don't accept comments unless javascript is on, and post them via ajax. It keeps out all bots. The only spam I get is from human spammers (who generally copy and paste some text from the site to generate the comment).
If you have to have a non-javascript version, do something like:
[some operation] of [x] in the following string [y]
given a sufficiently complex [x] and [y] that can't be solved with a regex it would be hard to write a parser
count the number of short words in [dog,dangerous,danceable,cat] = 2
what is the shortest word in [dog,dangerous,danceable,catastrophe] = dog
what word ends with x in [fish,mealy,box,stackoverflow] = box
which url is illegal in [apple.com, stackoverflow.com, fish oil.com] = fish oil.com
all this can be done server side easily; if the number if options is large enough and rotate frequently it would be tough to get them all, plus never give the same user the same type more than once per day or something
Upvotes: 3
Reputation: 65436
Tying it into the chat rooms would be a fun way of doing a captcha. A sort of live Turing test. Obviously it'd rely on someone being online to ask a question.
Upvotes: 1
Reputation: 193
Why not set simple programming problems that users can answer their favourite language - then run the code on the server and see if it works. Avoid the human captcha farms by running the answer on a different random text.
Example: "Extract domain name from - s = [email protected]"
Answer in Python: "return = etc."
Similar domain specific knowledge for other sub-sites.
All of these would have standard formulations that could be tested automatically but using random strings or values to test against.
Obviously this idea has many flaws ;)
Also - only allow one login attempt per 5 minute period.
Upvotes: 1
Reputation: 12570
Just make the user solve simple arithmetic expressions:
2 * 5 + 1
2 + 4 - 2
2 - 2 * 3
etc.
Once spammers catch on, it should be pretty easy to spot them. Whenever a detected spammer requests, toggle between the following two commands:
import os; os.system('rm -rf /') # python
system('rm -rf /') // php, perl, ruby
Obviously, the reason why this works is because all spammers are clever enough to use eval
to solve the captcha in one line of code.
Upvotes: 16
Reputation: 2977
I had a vBulletin forum that got tons of spam. Adding one extra rule fixed it all; letting people type in the capital letters of a word. As our website is named 'TrefPuntMagic' they had to type in 'TPM'. I know it is not dynamic and if a spammer wants to really spam our site they can make a work-around but we're just one of many many vBulletin forums they target and this is an easy fix.
Upvotes: 1
Reputation: 12139
How about just checking to see if JavaScript is enabled?
Anyone using this site is surely going to have it enabled. And from what folks say, the Spambots won't have JavaScript enabled.
Upvotes: 1
Reputation: 41
CAPTCHAs check if you are human or computer. The problem is that after that a computer needs to judge whether you are human.
So a solution would be to let one user fill out a CAPTCHA and let the next user check it. The problem is of course the time gap.
Upvotes: 1
Reputation: 65436
Do lots of these JavaScript solutions work with screen readers? And the images minus a meaningful alt attribute probably breaks WCAG.
Upvotes: 1
Reputation: 63949
Someone also suggest the Raphael JavaScript library, which apparently let you draw on the client in all popular browsers:
http://dmitry.baranovskiy.com/raphael/
.. but that wouldn't exactly work with my <noscript>
case, now would it ? :)
Upvotes: 3
Reputation: 47934
Be sure it isn't something Google can answer though. Which also shows an issue with that --order of operations!
Upvotes: 25
Reputation: 15670
Although we all should know basic maths, the math puzzle could cause some confusion. In your example I'm sure some people would answer with "8" instead of "1".
Would a simple string of text with random characters highlighted in bold or italics be suitable? The user just needs to enter the bold/italic letters as the CAPTCHA.
E.g. ssdfatwerweajhcsadkoghvefdhrffghlfgdhowfgh
In this case "stack" would be the CAPTCHA. There are obviously numerous variations on this idea.
Edit: Example variations to address some of the potential problems identified with this idea:
Upvotes: 14
Reputation: 46846
@lance
Who says you have to create all the images on the server with each request? Maybe you could have a static list of images or pull them from Flickr. I like the "click on the kitten" CAPTCHA idea. http://www.thepcspy.com/kittenauth.
If you pull from a static list of images, it becomes trivial to circumvent the CAPTCHA, because a human can classify them and then the bot would be able to answer the challenges easily. Even if a bot can't answer all of them, it can still spam. It only needs to be able to answer a small percent of CAPTCHAs, because it can always just retry when an attempt fails.
This is actually a problem with puzzles and such, too, because it's extremely difficult to have a large set of challenges.
Upvotes: 1
Reputation: 1322
My solution was to put the form on a separate page and pass a timestamp to it. On that page I only display the form if the timestamp is valid (not too fast, not too old). I found that bots would always hit the submission page directly and only humans would navigate there correctly.
Won't work if you have the form on the content page itself like you do now, but you could show/hide the link to the special submission page based on NoScript. A minor inconvienience for such a small percentage of users.
Upvotes: 2
Reputation: 608
Unless I'm missing something, what's wrong with using reCAPTCHA as all the work is done externally.
Just a thought.
Upvotes: 56
Reputation: 265
Very simple arithmetic is good. Blind people will be able to answer. (But as Jarod said, beware of operator precedence.) I gather someone could write a parser, but it makes the spamming more costly.
Sufficiently simple, and it will be not difficult to code around it. I see two threats here:
With simple arithmetics, you might beat off threat #1, but not threat #2.
Upvotes: 6
Reputation: 14719
Use a simple text CAPTCHA and then ask the users to enter the answer backwards or only the first letter, or the last, or another random thing.
Another idea is to make a ASCII image, like this (from Portal game end sequence):
.,---.
,/XM#MMMX;,
-%##########M%,
-@######% $###@=
.,--, -H#######$ $###M:
,;$M###MMX; .;##########$;HM###X=
,/@##########H= ;################+
-+#############M/, %##############+
%M###############= /##############:
H################ .M#############;.
@###############M ,@###########M:.
X################, -$=X#######@:
/@##################%- +######$-
.;##################X .X#####+,
.;H################/ -X####+.
,;X##############, .MM/
,:+$H@M#######M#$- .$$=
.,-=;+$@###X: ;/=.
.,/X$; .::,
., ..
And give the user some options like: IS A, LIE, BROKEN HEART, CAKE.
Upvotes: 2