Reputation: 926
Suppose I have a column of arbitrary length where each cell contains a string of text. Is there a way to determine what words appear most frequently in the column (not knowing in advance which words to check) and subsequently order these words along with their frequencies in a two column table? Would VBA be best for this task?
As an example, a cell might contain the string "This is a string, and the # of characters inthis string is>0." (errors intentional)
Upvotes: 4
Views: 43397
Reputation: 11
Here's a tiny fix plus an enhancement to the script kindly offered by "Gary's Student". The fix is that while building the collection is apparently not case-sensitive (and this is correct--we probably don't want new items added to the collection that differ only in case from existing items), the IF statement that does the counting IS case-sensitive as written, so it doesn't count correctly. Just change that line to...
If LCase(a) = LCase(v) Then J = J + 1
And here's my enhancement. To use it, you first select one or more columns but NOT their (first) header/label rows. Then run the script, and it gives results for each selected column in a new worksheet--along with that header/label row so you know what you're looking at.
I'm just a dabbler. I just hack stuff when I need to get a job done, so it's not elegant, I'm sure...
Sub FrequencyV2() 'Modified from: https://stackoverflow.com/questions/21858874/counting-the-frequencies-of-words-in-excel-strings
'It determines the frequency of words found in each selected column.
'Puts results in new worksheets.
'Before running, select one or more columns but not the header rows.
Dim rng As Range
Dim row As Range
Dim col As Range
Dim cell As Range
Dim ws As Worksheet
Dim wsNumber As Long 'Used to put a number in the names of the newly created worksheets
wsNumber = 1
Set rng = Selection
For Each col In rng.Columns
Dim BigString As String, I As Long, J As Long, K As Long
BigString = ""
For Each cell In col.Cells
BigString = BigString & " " & cell.Value
Next cell
BigString = Trim(BigString)
ary = Split(BigString, " ")
Dim cl As Collection
Set cl = New Collection
For Each a In ary
On Error Resume Next 'This works because an error occurs if item already exists in the collection.
'Note that it's not case sensitive. Differently capitalized items will be identified as already belonging to collection.
cl.Add a, CStr(a)
Next a
Set ws = Sheets.Add(After:=Sheets(Sheets.Count))
ws.Name = "F" & CStr(wsNumber)
wsNumber = wsNumber + 1
Worksheets(ws.Name).Cells(1, "A").Value = col.Cells(1, 1).Offset(-1, 0).Value 'Copies the table header text for current column to new worksheet.
For I = 1 To cl.Count
v = cl(I)
Worksheets(ws.Name).Cells(I + 1, "A").Value = v 'The +1 needed because header text takes up row 1.
J = 0
For Each a In ary
If LCase(a) = LCase(v) Then J = J + 1
Next a
Worksheets(ws.Name).Cells(I + 1, "B") = J 'The +1 needed because header text takes up row 1.
Next I
Next col
End Sub
Upvotes: 1
Reputation: 96771
Select a portion of column A and run this small macro ( the table will be placed in cols. B & C :
Sub Ftable()
Dim BigString As String, I As Long, J As Long, K As Long
BigString = ""
' Add code to sum both "All" and "all" ' Add code to separate "." "!" etc. from the word preceeding them so that word ' is also counted in the total. For example: "all." should not be reported as 1 ' "all." but "all" be added to the total count of "all" words. ' Would you publish this new code?
For Each r In Selection
BigString = BigString & " " & r.Value
Next r
BigString = Trim(BigString)
ary = Split(BigString, " ")
Dim cl As Collection
Set cl = New Collection
For Each a In ary
On Error Resume Next
cl.Add a, CStr(a)
Next a
For I = 1 To cl.Count
v = cl(I)
Cells(I, "B").Value = v
J = 0
For Each a In ary
If a = v Then J = J + 1
Next a
Cells(I, "C") = J
Next I
End Sub
Upvotes: 11
Reputation: 21
Using Google Sheets:
index((Transpose(ArrayFormula(QUERY(TRANSPOSE(SPLIT(JOIN(" ",$B$2)," ")&{"";""}),"select Col1, count(Col2) group by Col1 order by count(Col2) desc limit 20 label Col1 'Word', count(Col2) 'Frequency'",0)))),1,$A6+1)&":"&index((Transpose(ArrayFormula(QUERY(TRANSPOSE(SPLIT(JOIN(" ",$B$2)," ")&{"";""}),"select Col1, count(Col2) group by Col1 order by count(Col2) desc limit 20 label Col1 'Word', count(Col2) 'Frequency'",0)))),2,$A6+1)
In the above $B$2 contains the text string
$A6 = 1 will give you the most used word
$A6 = 2 will give you the second most used word etc.
This is set to do 20 most frequent. If you want more, increase the limit value to whatever you want.
Upvotes: 2
Reputation: 14179
Given this:
I'll use a pivot table to get this:
Best part is, if I got more, it's easy to get Top 5, 10, etc. And it'll always result to unique indices. From there, there are all manners of editing and calculation you can do. :)
Upvotes: 5