Sherif Buzz
Sherif Buzz

Reputation: 1226

php code to check for repeating characters / bogus text

i'm running a dating site and there is a place where people enter their profile - I already have a bad-words filter but now I have a problem where people enter a profile that is just garbage characters or just "aaaaaaaaaaaaaaaaaaaa" or "--------------" etc. I'm looking for an effective way of filtering out the long words of repeated characters. thanks in advance.

Upvotes: 2

Views: 491

Answers (3)

oezi
oezi

Reputation: 51797

this should do it (but it will replace double-characters too, mabe you need to edit a bit):

preg_replace('{(.)\1+}','$1',$text);

OT: can't belive there are still people who use bad-word filters...

Upvotes: 2

Piskvor left the building
Piskvor left the building

Reputation: 92752

You could use a word-list, and flag each message that has long words (e.g. 5+ chars) not on the list - if the field contains 5 8-letter words, of which none are in a dictionary, it's likely it's not meaningful data.

Upvotes: 0

miku
miku

Reputation: 188014

Maybe you need some bayesian spam filter-alike filter for that kind of stuff.

Particular words have particular probabilities of occurring in spam email and in legitimate email. For instance, most email users will frequently encounter the word "Viagra" in spam email, but will seldom see it in other email. The filter doesn't know these probabilities in advance, and must first be trained so it can build them up. To train the filter, the user must manually indicate whether a new email is spam or not. ...

Upvotes: 2

Related Questions