Reputation: 14084
There are some good examples on how to calculate word frequencies in C#, but none of them are comprehensive and I really need one in VB.NET.
My current approach is limited to one word per frequency count. What is the best way to change this so that I can get a completely accurate word frequency listing?
wordFreq = New Hashtable()
Dim words As String() = Regex.Split(inputText, "(\W)")
For i As Integer = 0 To words.Length - 1
If words(i) <> "" Then
Dim realWord As Boolean = True
For j As Integer = 0 To words(i).Length - 1
If Char.IsLetter(words(i).Chars(j)) = False Then
realWord = False
End If
Next j
If realWord = True Then
If wordFreq.Contains(words(i).ToLower()) Then
wordFreq(words(i).ToLower()) += 1
Else
wordFreq.Add(words(i).ToLower, 1)
End If
End If
End If
Next
Me.wordCount = New SortedList
For Each de As DictionaryEntry In wordFreq
If wordCount.ContainsKey(de.Value) = False Then
wordCount.Add(de.Value, de.Key)
End If
Next
I'd prefer an actual code snippet, but generic 'oh yeah...use this and run that' would work as well.
Upvotes: 3
Views: 5860
Reputation: 55472
This might be what your looking for:
Dim Words = "Hello World ))))) This is a test Hello World"
Dim CountTheWords = From str In Words.Split(" ") _
Where Char.IsLetter(str) _
Group By str Into Count()
I have just tested it and it does work
EDIT! I have added code to make sure that it counts only letters and not symbols.
FYI: I found an article on how to use LINQ and target 2.0, its a feels bit dirty but it might help someone http://weblogs.asp.net/fmarguerie/archive/2007/09/05/linq-support-on-net-2-0.aspx
Upvotes: 3
Reputation: 25271
Pretty close, but \w+ is a good regex to match with (matches word characters only).
Public Function CountWords(ByVal inputText as String) As Dictionary(Of String, Integer)
Dim frequency As New Dictionary(Of String, Integer)
For Each wordMatch as Match in Regex.Match(inputText, "\w+")
If frequency.ContainsKey(wordMatch.Value.ToLower()) Then
frequency(wordMatch.Value.ToLower()) += 1
Else
frequency.Add(wordMatch.Value.ToLower(), 1)
End If
Next
Return frequency
End Function
Upvotes: 1
Reputation:
Public Class CountWords
Public Function WordCount(ByVal str As String) As Dictionary(Of String, Integer)
Dim ret As Dictionary(Of String, Integer) = New Dictionary(Of String, Integer)
Dim word As String = ""
Dim add As Boolean = True
Dim ch As Char
str = str.ToLower
For index As Integer = 1 To str.Length - 1 Step index + 1
ch = str(index)
If Char.IsLetter(ch) Then
add = True
word += ch
ElseIf add And word.Length Then
If Not ret.ContainsKey(word) Then
ret(word) = 1
Else
ret(word) += 1
End If
word = ""
End If
Next
Return ret
End Function
End Class
Then for a quick demo application, create a winforms app with one multiline textbox called InputBox, one listview called OutputList and one button called CountBtn. In the list view create two columns - "Word" and "Freq." Select the "details" list type. Add an event handler for CountBtn. Then use this code:
Imports System.Windows.Forms.ListViewItem
Public Class MainForm
Private WordCounts As CountWords = New CountWords
Private Sub CountBtn_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles CountBtn.Click
OutputList.Items.Clear()
Dim ret As Dictionary(Of String, Integer) = Me.WordCounts.WordCount(InputBox.Text)
For Each item As String In ret.Keys
Dim litem As ListViewItem = New ListViewItem
litem.Text = item
Dim csitem As ListViewSubItem = New ListViewSubItem(litem, ret.Item(item).ToString())
litem.SubItems.Add(csitem)
OutputList.Items.Add(litem)
Word.Width = -1
Freq.Width = -1
Next
End Sub
End Class
You did a terrible terrible thing to make me write this in VB and I will never forgive you.
:p
Good luck!
EDIT
Fixed blank string bug and case bug
Upvotes: 2
Reputation: 827356
This might be helpful:
Word frequency algorithm for natural language processing
Upvotes: 1