Reputation: 3018
I'm using str_word_count()
to calculate the number of words in a content from CKEditor
. the content I get from the CKEditior is an HTML content, and I need to calculate the word count. in MS words I get the word count 328. On the other hand in html tags I get from my content after using str_word_count()
a 362 words. Is there any way to remove any HTML tags from a php string variable? I tried to use strip_tags()
, and it gave me 336. is there any way to get the exact word count in PHP ? thank you in advance.
for example this essay entered by a user like this.
Mixed School or Unisex School
Have you ever think about the impact of mixed schools for students? Most of the schools in the U.S are mixed gender, which mean girls and boys are studying with each other in the same classroom. Some parents wonder about the influences of their child’s in the school either in mixed school or in unisex ones. These influences are not about the education only, the influences about their personality, behavior with the opposite sex and finally their education. In my opinion, I think the unisex schools for teenager’s students are much better than mixed schools, and this conclusion based in many reasons.
and in the MS word the word count is: 107
in php
Mixed School or Unisex School
Have you ever think about the impact of mixed schools for students? Most of the schools in the U.S are mixed gender, which mean girls and boys are studying with each other in the same classroom. Some parents wonder about the influences of their child’s in the school either in mixed school or in unisex ones. These influences are not about the education only, the influences about their personality, behavior with the opposite sex and finally their education. In my opinion, I think the unisex schools for teenager’s students are much better than mixed schools, and this conclusion based in many reasons.
and the result: 114
I'm calculating an extra 7 words for one paragraph essay.
after using
$text = strip_tags($this->orginal_content);
$text = str_replace(' ',"",$text);
$this->orginal_content_count = str_word_count($text);
the result: 112
I've found 3 spaces
Mixed School or Unisex School Have you ever think about the impact of mixed schools for students? Most of the schools in the U.S are mixed gender, which mean girls and boys are studying with each other in the same classroom. Some parents wonder about the influences of their child’s in the school either in mixed school or in unisex ones. These influences are not about the education only, the influences about their personality, behavior with the opposite sex and finally their education. In my opinion, I think the unisex schools for teenager’s students are much better than mixed schools, and this conclusion based in many reasons.
Upvotes: 1
Views: 3719
Reputation: 168715
Okay.
You already know about strip_tags()
. That's a good start.
You're replacing
with a space, but that only deals with that single specific entity. You would be better off using PHP's html_entity_decode()
function which will get rid of all of the entity codes from your string.
If extra spacing is causing you problems, you could try doing str_replace()
or preg_replace()
to get rid of them. eg:
$output = preg_replace('/\s\s+/',' ',$input);
This will convert all multiple-whitespace instances into a single space character.
Now your word count should work a little better.
Hope that helps.
Upvotes: 2