Reputation: 609
Thanks to Document Transformations on Filestack I can get a text/plain output from .DOC/.DOCX files. I want to count the number of words only (no numbers nor punctuation symbols) of this output with PHP and display in HTML page. So I have this:
<button type="button" id="load" class="btn btn-md btn-info">LOAD FILES</button>
<br>
<div id="result"></div>
<script src="../vendors/jquery/dist/jquery.min.js"></script>
<script src="https://static.filestackapi.com/v3/filestack.js"></script>
<script>
function numWordsR(urlk){
$.post("result_filestack.php",{
molk: urlk //urlk, example: https://process.filestackapi.com/output=format:txt/AXXXXAXeeeeW33A";
}).done(function(resp){
$("#result").html(resp);
});
}
</script>
And my file result_filestack.php:
$url = $_POST['molk'];
$content = file_get_contents($url); //get txt/plain output content
$onlywords = preg_replace('/[[:punct:]\d]+/', '', $content); //no numbers nor punctuation symbols
function get_num_of_words($string) {
$string = preg_replace('/\s+/', ' ', trim($string));
$words = explode(" ", $string);
return count($words);
}
$numwords = get_num_of_words($onlywords);
echo "<b>TEXT:</b>: ".$onlywords."<br><br>Number of words: ".$numwords;
I obtain this result:
For example, in this case the result says there's 585 words in the text, but if I copy and paste that text in MS Word it says 612 words. I change PHP code to map the text array:
function get_text($string) {
$string = preg_replace('/\s+/', ' ', trim($string));
$words = explode(" ", $string);
return $words;
}
$texto002 = get_text($onlywords);
echo print_r($texto002);
I notice that there are errors counting the words, in some parts is taking two or three words as one:
How can I fix it?
I'd like your help.
Upvotes: 2
Views: 106
Reputation: 447
It could be because the spaces aren't the regular spaces but special characters, experienced this a while back and before exploding the regular space I replaced the entities with space
function get_num_of_words($string) {
$string = preg_replace('/\s+/', ' ', trim($string));
$string = str_replace(" ", " ", $string);
$string = str_replace(" ", " ", $string);
$words = explode(" ", $string);
return count($words);
}
Upvotes: 2