ProgrammerGirl
ProgrammerGirl

Reputation: 3223

How to check if users are spamming other users with similar messages?

One of the PHP/MySQL sites I administer is a Social Network and I've noticed that spammers will send a lot of similarly-looking messages to several other users.

Due to the number of messages sent by the same user account and the similarities in those messages sent, it seems like it should be relatively easy to identify users that are spamming other users in this way, but I just don't know how to do that in PHP/MySQL. The messages are stored in the DB as type TEXT.

How can I identify these spammers so I can get rid of them automatically when they start sending too many similarly-looking messages?

Edit:

The Spam messages are normally at least a paragraph of text, so we could safely ignore messages with less than 100 characters and automatically let those through.

Upvotes: 3

Views: 438

Answers (3)

Paul Gregory
Paul Gregory

Reputation: 1753

Now, a human being can determine which of these senders are acceptable and which are spammers. A human being that can see everyone's messages, even more so. But you don't want to be reading every message!

First, you'll need to have a message flag or status so that a message can be added to the database, yet not appear in the recipient's inbox because spam is suspected.

Second, you'll need to have a user flag or status so that a user can be prevented from sending more messages, because spam is suspected.

I believe the best approach is:

  • Add three new DB fields to messages - words, links, flagstatus (two text and a tiny int)
  • Add one new field to users - spamwarnings (int)
  • Have PHP process the message as it is added. Filter out common words (a, the) and the name of the recipient and URLs and save unique words in words and unique links in links.
  • Do a first pass of spam testing (see below) as the message is added to the database (because you already have the message text in PHP here, it's a good time to check it). If the score is high, flag it for automated/manual review.
  • Allow users to mark messages as spam
  • Hide flagged messages from users' inboxes and notifications
  • Run a second pass of spam scoring hourly
  • Have humans moderate suspected spam, releasing it or removing it

First-Pass Spam Scoring

  • Has the recipient ever sent a message to the sender? Yes = -10, No = +2
  • Are there links in the message? +2 for yes, and +1 for each link
  • Does the message contain certain 'spammy' words? +1 for each word.
  • How many messages has the sender sent in the past hour? +1 for each.
  • Does the user have a spamwarning count of 2 or more? A score of 5 would flag this for review, and increment the user's spamwarning count.

Second-Pass Spam Scoring

  • This is the part that would compare flagged messages with other flagged messages by the same sender using a combination of the other ideas on this page.

Human Moderation

  • I can't see how this can be avoided, but the above will reduce the number of messages to read. Also, this could be done from just reading the unique words and links (maintaining some privacy).

It should also be possible to use much of the above structure to moderate messages for inappropriate content.

Upvotes: 1

Gustek
Gustek

Reputation: 3760

Spam messages will have link inside, so You can filter out those without link.

And You should try to prevent first, so if one user starts sending many messages in short time to many users probably it will be spam.

You can do it by having some kind of counter in session, You would increment it with each message send to new user and if it is over 20 per hour (I just made up this number to make it efficient You will need some tests) he may be spamming and start asking him for captcha or block his chat for 15 minutes, report him to admin to check manually

Upvotes: 3

Mark Topper
Mark Topper

Reputation: 210

You can search for messages looking like the one they are posting now by using the following method.

SELECT * FROM `messages`
WHERE MATCH (`messages`.`content`) against ($message)
&& `messages`.`user` = $user

That would select messages which are matching some of the content from the current user.

Hope it helps.

Upvotes: 1

Related Questions