Jigar patel
Jigar patel

Reputation: 215

Counting Words of an Word Document using VB.NET

I want to count only words in a word document but i am getting a different o/p,

I have tried this,

 Dim objapp As Word.Application
    objapp = New Word.Application()
    Dim app As Application = New Application

    ' Open specified file.
    Dim doc As Document = objapp.Documents.Open(TextBox1.Text & "\" & "TEST.doc")

    ' Loop through all words.
    Dim count As Integer = doc.Words.Count
    For i As Integer = 1 To count
        ' Write word to screen.
        Dim text As String = doc.Words(i).Text

    Next
    Dim objWriter As New System.IO.StreamWriter(TextBox1.Text & "\" & "Error.txt")
    objWriter.Write("Word Count :" & count)
    objWriter.Close()
    ' Quit the application.
    app.Quit()
    doc.Close

Here i am able to count the words but it is also counting when i enter in a document i.e if there are 8 words in document with 2 enters it shows me count:10 instead it should only count:8 i.e only words.

Plz anyone help me with the required logic,

Thanks in advance.

Upvotes: 4

Views: 2035

Answers (3)

Deepak Sharma
Deepak Sharma

Reputation: 4170

I am not sure about vb.net but if c# code can help you out then here is the code of word count in C#.

    /* button click event - create the object from file path.
     * get the whole string then count the word.
     */
    private void btnWordCount_Click(object sender, EventArgs e)
    {
        Microsoft.Office.Interop.Word.Application word =
            new Microsoft.Office.Interop.Word.Application();
        object miss = System.Reflection.Missing.Value;

        object path = doc_file_path;
        object readOnly = true;

        Microsoft.Office.Interop.Word.Document docs = word.Documents.Open(
                                                    ref path, ref miss, ref readOnly, ref miss,
                                                    ref miss, ref miss, ref miss, ref miss,
                                                    ref miss, ref miss, ref miss, ref miss,
                                                    ref miss, ref miss, ref miss, ref miss);
        string totaltext = "";
        for (int i = 0; i < docs.Paragraphs.Count; i++)
        {
            totaltext += " \r\n " + docs.Paragraphs[i + 1].Range.Text.ToString();
        }
        tbText.Text = totaltext;
        lblWordCount.Text = WordCount(totaltext).ToString();
        docs.Close();
        word.Quit();
    }

    /* this function accepts the string (here in case string mean all the line on word)
     * and then return the word count in that line.
     */
    private int WordCount(string line)
    {
        line = line.Trim();
        return line.Split(' ').Length;
    }

Upvotes: 0

Harrison
Harrison

Reputation: 3953

Based on the documentation for Words Interface

The Count property includes punctuation and paragraph marks in the total. If you need a count of the actual words in a document, use the Word Count dialog box.

I have found a support knowledge base article: Word count appears inaccurate when you use the VBA "Words" property

To return only the number of words in a document or a range, excluding paragraph marks and punctuation, use the ComputeStatistics method instead of the Words property.

Range.ComputeStatistics Method

'Usage
Dim Statistic As WdStatistic
Dim returnValue As Integer
Dim range1 As Range
returnValue = range1.ComputeStatistics(Statistic)

Upvotes: 1

Vignesh Kumar A
Vignesh Kumar A

Reputation: 28403

Use regex to match whether its word or not

Like this

Dim WordCount = New Regex("\w+").Matches(text).Count

Upvotes: 0

Related Questions