Othman
Othman

Reputation: 3018

str_word_count() without HTML

I'm using str_word_count() to calculate the number of words in a content from CKEditor. the content I get from the CKEditior is an HTML content, and I need to calculate the word count. in MS words I get the word count 328. On the other hand in html tags I get from my content after using str_word_count() a 362 words. Is there any way to remove any HTML tags from a php string variable? I tried to use strip_tags(), and it gave me 336. is there any way to get the exact word count in PHP ? thank you in advance.

for example this essay entered by a user like this.

Mixed School or Unisex School

Have you ever think about the impact of mixed schools for students? Most of the schools in the U.S are mixed gender, which mean girls and boys are studying with each other in the same classroom. Some parents wonder about the influences of their child’s in the school either in mixed school or in unisex ones. These influences are not about the education only, the influences about their personality, behavior with the opposite sex and finally their education. In my opinion, I think the unisex schools for teenager’s students are much better than mixed schools, and this conclusion based in many reasons.

and in the MS word the word count is: 107

in php

 

Mixed School or Unisex School

 

Have you ever think about the impact of mixed schools for students? Most of the schools in the U.S are mixed gender, which mean girls and boys are studying with each other in the same classroom. Some parents wonder about the influences of their child’s in the school either in mixed school or in unisex ones. These influences are not about the education only, the influences about their personality, behavior with the opposite sex and finally their education. In my opinion, I think the unisex schools for teenager’s students are much better than mixed schools, and this conclusion based in many reasons.

and the result: 114

I'm calculating an extra 7 words for one paragraph essay.

edit

after using

    $text = strip_tags($this->orginal_content);
    $text = str_replace(' ',"",$text);
    $this->orginal_content_count = str_word_count($text);

the result: 112

I've found 3 spaces

        Mixed School or Unisex School       Have you ever think about the impact of mixed schools for students? Most of the schools in the U.S are mixed gender, which mean girls and boys are studying with each other in the same classroom. Some parents wonder about the influences of their child’s in the school either in mixed school or in unisex ones. These influences are not about the education only, the influences about their personality, behavior with the opposite sex and finally their education. In my opinion, I think the unisex schools for teenager’s students are much better than mixed schools, and this conclusion based in many reasons. 

Upvotes: 1

Views: 3719

Answers (1)

Spudley
Spudley

Reputation: 168715

Okay.

You already know about strip_tags(). That's a good start.

You're replacing   with a space, but that only deals with that single specific entity. You would be better off using PHP's html_entity_decode() function which will get rid of all of the entity codes from your string.

If extra spacing is causing you problems, you could try doing str_replace() or preg_replace() to get rid of them. eg:

$output = preg_replace('/\s\s+/',' ',$input);

This will convert all multiple-whitespace instances into a single space character.

Now your word count should work a little better.

Hope that helps.

Upvotes: 2

Related Questions