Reputation: 2512
I'm programming a small web app to manage texts with external writers, actually the whole thing is great but I have a small problem. And it's related with the word counter.
The writers will be paid based on the number of words in text, the text contains html tags. But the problem is that there are german characters used(Ä, Ö, Ü, ß)
So at the first position I deleted the tags
$content = strip_tags($content);
then I replace new lines and tabs with simple spaces
$replace = array("\r\n", "\n", "\r", "\t");
$content = str_replace($replace, ' ', $content);
and finally I try to get the number of words
Method 1:
$characterMap = 'ÄÖÜäöü߀';
$count = str_word_count($content, 0, $characterMap);
Method 2:
$to_delete = array('.', ',', ';', "'", '@');
$content = str_replace($to_delete, '', $content);
$count = count(preg_split('~[^\p{L}\p{N}\']+~u',$content));
but the results are different to others like the ones from Word, or from CKEditor Plugin word_count.
For example for an Example Text
Word and CkEditor Word Count give 987 Words
Method 1: 968 Words
Method 2: 995 Words
The problem bei the second method are just the - separators by the words, but my question is if there is a better method to find the number of words in a text in php?
Upvotes: 3
Views: 1127
Reputation:
This might give a better approximation for method 2:
$string = "He€.llo, ho-w€d9 € are you? fi€ne ÄÖÜäöü߀, and 'ÄÖÜäöü߀ you?";
$words = preg_split
( '/[^\p{L}\p{N}]*\p{Z}[^\p{L}\p{N}]*/u',
$string
);
print( "count = " . count($words) . "\n\n" );
print_r($words);
Upvotes: 0
Reputation: 72875
First, you could combine your two replace statements into one -- word count will ignore double spaces. Second, I'm unsure what the objective is of your regex, but it looks mighty strange.
You should be able to simply do this:
$content = strip_tags($content);
$replace = array("\r\n", "\n", "\r", "\t", '.', ',', ';', "'", '@');
$content = str_replace($replace, ' ', $content);
$count = str_word_count($content, 0, $characterMap);
Upvotes: 1
Reputation: 496
You could try taking a look at str_word_count and see if that matches up better than your current solutions.
http://php.net/manual/en/function.str-word-count.php
An example of usage being
$Tag = 'My Name is Gaurav';
$word = str_word_count($Tags);
echo $word;
Upvotes: 0