Reputation: 215
I want to count only words in a word document but i am getting a different o/p,
I have tried this,
Dim objapp As Word.Application
objapp = New Word.Application()
Dim app As Application = New Application
' Open specified file.
Dim doc As Document = objapp.Documents.Open(TextBox1.Text & "\" & "TEST.doc")
' Loop through all words.
Dim count As Integer = doc.Words.Count
For i As Integer = 1 To count
' Write word to screen.
Dim text As String = doc.Words(i).Text
Next
Dim objWriter As New System.IO.StreamWriter(TextBox1.Text & "\" & "Error.txt")
objWriter.Write("Word Count :" & count)
objWriter.Close()
' Quit the application.
app.Quit()
doc.Close
Here i am able to count the words but it is also counting when i enter in a document i.e if there are 8 words in document with 2 enters it shows me count:10 instead it should only count:8 i.e only words.
Plz anyone help me with the required logic,
Thanks in advance.
Upvotes: 4
Views: 2035
Reputation: 4170
I am not sure about vb.net but if c# code can help you out then here is the code of word count in C#.
/* button click event - create the object from file path.
* get the whole string then count the word.
*/
private void btnWordCount_Click(object sender, EventArgs e)
{
Microsoft.Office.Interop.Word.Application word =
new Microsoft.Office.Interop.Word.Application();
object miss = System.Reflection.Missing.Value;
object path = doc_file_path;
object readOnly = true;
Microsoft.Office.Interop.Word.Document docs = word.Documents.Open(
ref path, ref miss, ref readOnly, ref miss,
ref miss, ref miss, ref miss, ref miss,
ref miss, ref miss, ref miss, ref miss,
ref miss, ref miss, ref miss, ref miss);
string totaltext = "";
for (int i = 0; i < docs.Paragraphs.Count; i++)
{
totaltext += " \r\n " + docs.Paragraphs[i + 1].Range.Text.ToString();
}
tbText.Text = totaltext;
lblWordCount.Text = WordCount(totaltext).ToString();
docs.Close();
word.Quit();
}
/* this function accepts the string (here in case string mean all the line on word)
* and then return the word count in that line.
*/
private int WordCount(string line)
{
line = line.Trim();
return line.Split(' ').Length;
}
Upvotes: 0
Reputation: 3953
Based on the documentation for Words Interface
The Count property includes punctuation and paragraph marks in the total. If you need a count of the actual words in a document, use the Word Count dialog box.
I have found a support knowledge base article: Word count appears inaccurate when you use the VBA "Words" property
To return only the number of words in a document or a range, excluding paragraph marks and punctuation, use the ComputeStatistics method instead of the Words property.
Range.ComputeStatistics Method
'Usage
Dim Statistic As WdStatistic
Dim returnValue As Integer
Dim range1 As Range
returnValue = range1.ComputeStatistics(Statistic)
Upvotes: 1
Reputation: 28403
Use regex to match whether its word or not
Like this
Dim WordCount = New Regex("\w+").Matches(text).Count
Upvotes: 0