zuk1
zuk1

Reputation: 18369

PHP Multiple Occurences Of Words Within A String

I need to check a string to see if any word in it has multiple occurences. So basically I will accept:

"google makes love"

but I don't accept:

"google makes google love" or "google makes love love google" etc.

Any ideas? Really don't know any way to approach this, any help would be greatly appreciated.

Upvotes: 2

Views: 4796

Answers (9)

Tom Haigh
Tom Haigh

Reputation: 57815

This seems fairly fast. It would be interesting to see (for all the answers) how the memory usage and time taken increase as you increase the length of the input string.

function check($str) {
    //remove double spaces
    $c = 1;
    while ($c) $str = str_replace('  ', ' ', $str, $c);

    //split into array of words
    $words = explode(' ', $str);
    foreach ($words as $key => $word) {
        //remove current word from array
        unset($words[$key]);
        //if it still exists in the array it must be duplicated
        if (in_array($word, $words)) {
            return false;
        }
    }
    return true;
}

Edit

Fixed issue with multiple spaces. I'm not sure whether it is better to remove these at the start (as I have) or check each word is non-empty in the foreach.

Upvotes: 1

meouw
meouw

Reputation: 42140

The regular expression way would definitely be my choice.

I did a little test on a string of 320 words with Veynom's function and a regular expression

function preg( $txt ) {
    return !preg_match( '/\b(\w+)\b.*?\1/', $txt );
}

Here's the test

$time['preg'] = microtime( true );

for( $i = 0; $i < 1000; $i++ ) {
    preg( $txt );
}

$time['preg'] = microtime( true ) - $time['preg'];


$time['veynom-thewickedflea'] = microtime( true );

for( $i = 0; $i < 1000; $i++ ) {
    single_use_of_words( $txt );
}

$time['veynom-thewickedflea'] = microtime( true ) - $time['veynom-thewickedflea'];

print_r( $time );

And here's the result I got

Array
(
    [preg] => 0.197616815567
    [veynom-thewickedflea] => 0.487532138824
)

Which suggests that the RegExp solution, as well as being a lot more concise is more than twice as fast. ( for a string of 320 words anr 1000 iterations )

When I run the test over 10 000 iterations I get

Array
(
    [preg] => 1.51235699654
    [veynom-thewickedflea] => 4.99487900734
)

The non RegExp solution also uses a lot more memory.

So.. Regular Expressions for me cos they've got a full tank of gas

EDIT
The text I tested against has duplicate words, If it doesn't, the results may be different. I'll post another set of results.

Update
With the duplicates stripped out ( now 186 words ) the results for 1000 iterations is:

Array
(
    [preg] => 0.235826015472
    [veynom-thewickedflea] => 0.2528860569
)

About evens

Upvotes: 2

lc.
lc.

Reputation: 116478

The simplest method is to loop through each word and check against all previous words for duplicates.

Upvotes: 0

user19302
user19302

Reputation:

function Accept($str)
{
    $words = explode(" ", trim($str));
    $len = count($words);
    for ($i = 0; $i < $len; $i++)
    {
        for ($p = 0; $p < $len; $p++)
        {
            if ($p != $i && $words[$i] == $words[$p])
            {
                return false;
            }
        }
    }
    return true;
}

EDIT

Entire test script. Note, when printing "false" php just prints nothing but true is printed as "1".

<?php

    function Accept($str)
    {
            $words = explode(" ", trim($str));
            $len = count($words);
            for ($i = 0; $i < $len; $i++)
            {
                    for ($p = 0; $p < $len; $p++)
                    {
                            if ($p != $i && $words[$i] == $words[$p])
                            {
                                    return false;
                            }
                    }
            }
            return true;
    }

echo Accept("google makes love"), ", ", Accept("google makes google love"), ", ",
    Accept("google makes love love google"), ", ", Accept("babe health insurance babe");


?>

Prints the correct output:

1, , , 

Upvotes: 1

Kevin
Kevin

Reputation: 13087

No need for loops or arrays:

<?php

$needle = 'cat';
$haystack = 'cat in the cat hat';

if ( occursMoreThanOnce($haystack, $needle) ) {
    echo 'Success'; 
} 

function occursMoreThanOnce($haystack, $needle) {
    return strpos($haystack, $needle) !== strrpos($haystack, $needle);
}

?>

Upvotes: 3

Veynom
Veynom

Reputation: 4147

Based on Wicked Flea code:

function single_use_of_words($str) {  
   $words = explode(' ', trim($str));  //Trim to prevent any extra blank
   if (count(array_unique($words)) == count($words)) {
      return true; //Same amount of words
   }   
   return false;
}

Upvotes: 5

Wes Mason
Wes Mason

Reputation: 1628

<?php
$words = preg_split('\b', $string, PREG_SPLIT_NO_EMPTY);
$wordsUnique = array_unique($words);
if (count($words) != count($wordsUnique)) {
    echo 'Duplicate word found!';
}
?>

Upvotes: 2

Robert K
Robert K

Reputation: 30328

Try this:

function single_use_of_words($str) {
  $words = explode(' ', $str);
  $words = array_unique($words);
  return implode(' ', $words);
}

Upvotes: 3

cmsjr
cmsjr

Reputation: 59185

Regular expression with backreferencing

http://www.regular-expressions.info/php.html

http://www.regular-expressions.info/named.html

Upvotes: -1

Related Questions