Praveen Angyan
Praveen Angyan

Reputation: 7265

Is there a free anti spam database?

Wordpress has a spam filtering plugin called Akismet that seems to be able to classify any block of text as spam or not. The only caveat being that you need to go through their interface and their database/algorithm is not open sourced or readily available otherwies.

There are also commercial providers that provide a web accessible API for you to classify the emails, comments or any other text being submitted by users in your web application.

Is there any sort of open source or freely accessible database that can classify a block of text as spam/non-spam?

Edit: Here's a clearer explanation of what I want

Basically I was hoping that there was an extensive database out there with the probabilities of certain phrases being spam. Since (I'm assuming) spammers spam all email addresses equally, by pre-populating my Bayesian spam filter with this database, I could create an application that starts off by capturing most spam without any user training.

Upvotes: 5

Views: 2949

Answers (3)

Ingwie Phoenix
Ingwie Phoenix

Reputation: 2983

Maybe this is totally a dead question - however, check this out: http://www.stopforumspam.com Use their API to check the IP or entered usernames or emails against their DB. But I advise you to use cURL with it's timeout parameter - the service may or may not time out on you sometimes.

Upvotes: 3

RichieHindle
RichieHindle

Reputation: 281475

Probably not exactly what you're looking for, but the MoinMoin Wiki maintainers keep a central list of Wiki spam regular expressions here: http://master.moinmo.in/BadContent

Upvotes: 2

Jon Galloway
Jon Galloway

Reputation: 53125

Update based on comment:

I don't think a simple database would do the trick. Most spam is algorithmicly generated (e.g. comment spam usually incorporates content from the post). Akismet does a combination of things, probably including link analysis and use of known spam signatures, but they don't publish it.

I've read about some interesting AI projects to classify good rather than bad content. You might also look at Spam Karma, which analyzes blog comments based on a variety of spammy triggers (post of response immediately after loading page, etc.).


Original answer (DNS blacklists):

Upvotes: 1

Related Questions