Tawfiq abu Halawah
Tawfiq abu Halawah

Reputation: 1234

Using JavaScript -How to Count words in Microsoft Word document ?

I am trying to count words inside a Microsoft word document using JavaScript I managed to count word inside normal text file. is there a way to do it for a Microsoft word file using for example "JavaScript API for Office" or any other method.

check this plunk https://plnkr.co/edit/5TJfNiPxv275GuimdIlj?p=preview

<!DOCTYPE html>
<html>

  <head>
    <link rel="stylesheet" href="style.css">
    <script src="script.js"></script>
  </head>

  <body>
    <h2>Microsoft Word Document Count Words! Using JavaScript?</h2>
    <input type="file" accept=".doc,.txt,.docx" onchange="calculateWords()" id="textDoc"/>
    <div>
      <h1 id="fileInformation">File word Count after choose</h1>
    </div>
  </body>

</html>

JavaScript Code

function calculateWords() {
    if (window.File && window.FileReader && window.FileList && window.Blob) {
        console.log("words");
        var doc = document.getElementById("textDoc");
        var f = doc.files[0];
        if (!f) {
            alert("Failed to load file");
            //validate file types yet to come
        } else if (false) {
            alert(f.type + " is not a valid text file.");
        } else {
            var r = new FileReader();//create file reader object
            r.readAsText(f);//read file as text

            //attach function to execute when loading file finishes. 
            r.onload = function (e) {
                var contents = e.target.result;
                var res = contents.split(" ");
                console.log(res.length);
                var fileInformation = "word Count = "+res.length;
            var info = document.getElementById("fileInformation");
            info.innerHTML = fileInformation;

            }
        }
    } else {
        alert('The File APIs are not fully supported by your browser.');
    }
}

Upvotes: 2

Views: 2018

Answers (1)

Jeremy J Starcher
Jeremy J Starcher

Reputation: 23863

Microsoft documents are not like normal text files .. they are binary files.

As such you would have to decode them into pure text, remove all formatting, remove headers and footers and continue. This is a significance challenge.

Just as a simple example, this is an piece of an RTF file:

{\rtf1\ansi{\fonttbl\f0\fswiss Helvetica;}\f0\pard
This is some {\b bold} text.\par
}

.DOC files are much more complicated, but binary. DOCX files are different yet.

So, in a simple answer: No, you can't do it.

Upvotes: 2

Related Questions