Norman
Norman

Reputation: 6365

PHP: Check string for certain words

How can I check if data submitted from a form or querystring has certain words in it?

I'm trying to look for words containing admin, drop, create etc in form [Post] data and querystring data so I can accept or reject it.

I'm converting from ASP to PHP. I used to do this using an array in ASP (keep all illegal words in a string and use ubound to check the whole string for those words), but is there a better (efficient) way to do this in PHP?

Eg: A string like this would be rejected: "The administrator dropped a blah blah" because it has admin and drop in it.

I intend using this to check usernames when creating accounts and for other things too.

Thanks

Upvotes: 1

Views: 3552

Answers (7)

Nate Starner
Nate Starner

Reputation: 521

This is actually pretty simple, use substr_count.

And example for you would be:

if (substr_count($variable_to_search, "drop"))
{
    echo "error";
}

And to make things even simpler, put your keywords (ie. "drop", "create", "alter") in an array and use foreach to check them. That way you cover all your words. An example

foreach ($keywordArray as $keyword)
{
    if (substr_count($variable_to_search, $keyword))
    { 
        echo "error"; //or do whatever you want to do went you find something you don't like
    }
}

Upvotes: 0

Jeffrey Blake
Jeffrey Blake

Reputation: 9709

You can certainly do a loop, as others have suggested. But I think you can get closer to the behavior you're looking for with an operation that directly uses arrays, plus it allows execution via a single if statement.

Originally, I was thinking you could do this with a simple preg_match() call (hence the downvote), however preg_match does not support arrays. Instead, you can do a replacement via preg_replace to have all rejected strings replaced with nothing, and then check to see if the string is changed. This is simple and avoids requiring a loop iteration for each rejected string.

$rejectedStrs = array("/admin/", "/drop/", "/create/");
if($input == preg_replace($rejectedStrs, "", $input)) {
   //do stuff
} else { 
   //reject
}

Note also that you can provide case-insensitive searches by using the i flag on the regex patterns, changing the array of patterns to $rejectedStrs = array("/admin/i", "/drop/i", "/create/i");

On Efficiency

There has been some debate about the efficiency of doing it this way vs the accepted nested loop method. I ran some tests and found the preg_replace method executed around twice as fast as the nested loop. Here is the code and output of those tests:

$input = "You can certainly do a loop, as others have suggested. But I think you can get closer to the behavior you're looking for with an operation that directly uses arrays, plus it allows execution via a single if statement. You can certainly do a loop, as others have suggested. But I think you can get closer to the behavior you're looking for with an operation that directly uses arrays, plus it allows execution via a single if statement.";

$input = "Short string with no matches";
$input2 = "Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. ";
$input3 = "Short string which loop will match quickly";
$input4 = "Longer string that will eventually be matches but first has a lot of words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words and then finally the word create near the end";

$start1 = microtime(true);
$rejectedStrs = array("/loop/", "/operation/", "/create/");
$p_matches = 0;
for ($i = 0; $i < 10000; $i++) {
    if (preg_check($rejectedStrs, $input)) $p_matches++;
    if (preg_check($rejectedStrs, $input2)) $p_matches++;
    if (preg_check($rejectedStrs, $input3)) $p_matches++;
    if (preg_check($rejectedStrs, $input4)) $p_matches++;
}

$start2 = microtime(true);
$rejectedStrs = array("loop", "operation", "create");
$l_matches = 0;
for ($i = 0; $i < 10000; $i++) {
    if (loop_check($rejectedStrs, $input)) $l_matches++;
    if (loop_check($rejectedStrs, $input2)) $l_matches++;
    if (loop_check($rejectedStrs, $input3)) $l_matches++;
    if (loop_check($rejectedStrs, $input4)) $l_matches++;
}

$end = microtime(true);
echo "preg_match: ".$start1." ".$start2."= ".($start2-$start1)."\nloop_match: ".$start2." ".$end."=".($end-$start2);

function preg_check($rejectedStrs, $input) {
    if($input == preg_replace($rejectedStrs, "", $input)) 
        return true;
    return false;
}

function loop_check($badwords, $string) {

    foreach (str_word_count($string, 1) as $word) {
        foreach ($badwords as $bw) {
            if (stripos($word, $bw) === 0) {
                return true;
            }
        }
        return false;
    }

}

Output:

preg_match: 1281908071.4032 1281908071.9947= 0.5915060043335

loop_match: 1281908071.9947 1281908073.006=1.0112948417664

Upvotes: 0

NullUserException
NullUserException

Reputation: 85478

You could use stripos()

int stripos ( string $haystack , string $needle [, int $offset = 0 ] )

You could have a function like:

function checkBadWords($str, $badwords) {
    foreach ($badwords as $word) {
        if (stripos(" $str ", " $word ") !== false) {
            return false;
        }
    }
    return true;
}

And to use it:

if (!checkBadWords('something admin', array('admin')) {
    // ...
}

Upvotes: 5

Knarf
Knarf

Reputation: 1273

function check($string, $array) {
    foreach($array as $item) {
        if( preg_match("/($item)/", $string)  )
            return true;
    }
    return false;
}

Upvotes: 0

Artefacto
Artefacto

Reputation: 97845

$badwords = array("admin", "drop",);
foreach (str_word_count($string, 1) as $word) {
    foreach ($badwords as $bw) {
        if (strpos($word, $bw) === 0) {
            //contains word $word that starts with bad word $bw
        }
    }
}

For JGB146, here is a performance comparison with regular expressions:

<?php
function has_bad_words($badwords, $string) {

    foreach (str_word_count($string, 1) as $word) {
        foreach ($badwords as $bw) {
            if (stripos($word, $bw) === 0) {
                return true;
            }
        }
        return false;
    }

}

function has_bad_words2($badwords, $string) {

    $regex = array_map(function ($w) {
        return "(?:\\b". preg_quote($w, "/") . ")"; }, $badwords);
    $regex = "/" . implode("|", $regex) . "/";
    return preg_match($regex, $string) != 0;

}

$badwords = array("abc", "def", "ghi", "jkl", "mnop");
$string = "The quick brown fox jumps over the lazy dog";

$start = microtime(true);
for ($i = 0; $i < 10000; $i++) {
 has_bad_words($badwords, $string);
}

echo "elapsed: ". (microtime(true) - $start);

$start = microtime(true);
for ($i = 0; $i < 10000; $i++) {
 has_bad_words2($badwords, $string);
}

echo "elapsed: ". (microtime(true) - $start);

Example output:

elapsed: 0.076514959335327
elapsed: 0.29999899864197

So regular expressions are much slower.

Upvotes: 2

You could use regular expression like this:

preg_match("~(admin)|(drop)|(another token)|(yet another)~",$subject);

building the pattern string from array

$pattern = implode(")|(", $banned_words);
$pattern = "~(".$pattern.")~";

Upvotes: 0

thomasrutter
thomasrutter

Reputation: 117401

strpos() will let you search for a substring within a larger string. It's quick and works well. It returns false if the string's not found, and a number (which could be zero, so you need to use === to check) if it finds the string.

stripos() is a case-insensitive version of the same.

I'm trying to look for words containing admin, drop, create etc in form [Post] data and querystring data so I can accept or reject it.

I suspect that you are trying to filter the string so it's suitable for including in something like a database query, or something like that. If this is the case, this is probably not a good way to go about it, and you'd need to actually need to escape the string using mysql_real_escape_string() or equivalent.

Upvotes: 3

Related Questions