Sleek Geek
Sleek Geek

Reputation: 4686

How to find exact words from an array in a string in PHP

I am searching a string for a group of words in an array to inform users if any are found. However, I get results that are not exact matches. Any ideas on how I can make it show only exact matches. My code looks like below.

<?php

// Profanity check  
 $profaneReport = "";
 $allContent = "Rice Beans Class stite";
 $profanity_list = "lass tite able";

      $profaneWords = explode( ' ', $profanity_list );

      $wordsFoundInProfaneList = []; // Create words an array

      //search for the words;

      foreach ( $profaneWords as $profane ) {

        if ( stripos( $allContent, $profane ) !== false ) {

          $wordsFoundInProfaneList[ $profane ] = true;

        }

      }


    // check if bad words were found

      if ( $wordsFoundInProfaneList !== 0 ) {

         $profaneReportDesc = "Sorry, your content may contain such words as " . "<strong>" . implode( ", ", array_keys( $wordsFoundInProfaneList )) . '</strong>"';

      } else {

       $profaneReportDesc = "Good: No profanity was found in your content";

      }

     echo $profaneReportDesc;
  

?>

The code above returns Sorry, your content may contain such words as lass, tite" When they are not exact matches for words in $allContent

Upvotes: 1

Views: 423

Answers (1)

FluffyKitten
FluffyKitten

Reputation: 14312

For the benefit of other users looking for an answer to similar question, and building on Alex Howansky's comment to add in more preparation of the input string so that it can more easily be converted into an array of words, you can do it like this:

  1. Remove all punctuation etc. that could affect breaking the string into individual words, and make sure the words are delimited by spaces by replacing all non-alphanumeric characters with spaces e.g. one,two.three will now be identifiable are 3 individual words)
  2. Convert the input string and profanity string to lower case for easier comparison
  3. Explode both strings into arrays (this is where replacing the spaces in the input string are important!)
  4. Intersect the arrays to find the words common to both

You might want to consider removing numerals from your input string also, depending on how you want to handle numbers.

The complete code with detailed comments is as follows:

// Profanity check  
$profaneReport = "";
$profanity_list = "hello TEN test commas";    
$allContent = "Hello, world! This is a senTENce for testing. It has more than TEN words and contains some punctuation,like commas.";

/* Create an array of all words in lowercase (for easier comparison) */
$profaneWords = explode( ' ', strtolower($profanity_list) );

/* Remove everything but a-z (i.e. all punctionation numbers etc.) from the sentence 
   We replace them with spaces, so we can break the sentence into words */
$alpha = preg_replace("/[^a-z0-9]+/", " ", strtolower($allContent));

/* Create an array of the words in the sentence */
$alphawords = explode( ' ', $alpha );

/* get all words that are in both arrays */
$wordsFoundInProfaneList = array_intersect ( $alphawords, $profaneWords);

// check if bad words were found, and display a message 
if ( !empty($wordsFoundInProfaneList)) {
    $profaneReportDesc = "Sorry, your content may contain such words as " . "<strong>" . implode( ", ", $wordsFoundInProfaneList) . '</strong>"';
} else {
    $profaneReportDesc = "Good: No profanity was found in your content";
}
echo $profaneReportDesc;

Upvotes: 4

Related Questions