114
114

Reputation: 926

Counting the Frequencies of Words in Excel Strings

Suppose I have a column of arbitrary length where each cell contains a string of text. Is there a way to determine what words appear most frequently in the column (not knowing in advance which words to check) and subsequently order these words along with their frequencies in a two column table? Would VBA be best for this task?

As an example, a cell might contain the string "This is a string, and the # of characters inthis string is>0." (errors intentional)

Upvotes: 4

Views: 43397

Answers (4)

Stephen McNutt
Stephen McNutt

Reputation: 11

Here's a tiny fix plus an enhancement to the script kindly offered by "Gary's Student". The fix is that while building the collection is apparently not case-sensitive (and this is correct--we probably don't want new items added to the collection that differ only in case from existing items), the IF statement that does the counting IS case-sensitive as written, so it doesn't count correctly. Just change that line to...

If LCase(a) = LCase(v) Then J = J + 1

And here's my enhancement. To use it, you first select one or more columns but NOT their (first) header/label rows. Then run the script, and it gives results for each selected column in a new worksheet--along with that header/label row so you know what you're looking at.

I'm just a dabbler. I just hack stuff when I need to get a job done, so it's not elegant, I'm sure...

Sub FrequencyV2() 'Modified from: https://stackoverflow.com/questions/21858874/counting-the-frequencies-of-words-in-excel-strings
'It determines the frequency of words found in each selected column.
'Puts results in new worksheets.
'Before running, select one or more columns but not the header rows.
    Dim rng As Range
    Dim row As Range
    Dim col As Range
    Dim cell As Range
    Dim ws As Worksheet
    Dim wsNumber As Long 'Used to put a number in the names of the newly created worksheets
    wsNumber = 1
    Set rng = Selection
    For Each col In rng.Columns
        Dim BigString As String, I As Long, J As Long, K As Long
        BigString = ""
        For Each cell In col.Cells
            BigString = BigString & " " & cell.Value
        Next cell
        BigString = Trim(BigString)
        ary = Split(BigString, " ")
        Dim cl As Collection
        Set cl = New Collection
        For Each a In ary
            On Error Resume Next 'This works because an error occurs if item already exists in the collection.
            'Note that it's not case sensitive.  Differently capitalized items will be identified as already belonging to collection.
            cl.Add a, CStr(a)
        Next a
        Set ws = Sheets.Add(After:=Sheets(Sheets.Count))
        ws.Name = "F" & CStr(wsNumber)
        wsNumber = wsNumber + 1
        Worksheets(ws.Name).Cells(1, "A").Value = col.Cells(1, 1).Offset(-1, 0).Value 'Copies the table header text for current column to new worksheet.
        For I = 1 To cl.Count
            v = cl(I)
            Worksheets(ws.Name).Cells(I + 1, "A").Value = v 'The +1 needed because header text takes up row 1.
            J = 0
            For Each a In ary
                If LCase(a) = LCase(v) Then J = J + 1
            Next a
            Worksheets(ws.Name).Cells(I + 1, "B") = J 'The +1 needed because header text takes up row 1.
        Next I
    Next col
End Sub

Upvotes: 1

Gary's Student
Gary's Student

Reputation: 96771

Select a portion of column A and run this small macro ( the table will be placed in cols. B & C :

Sub Ftable()
    Dim BigString As String, I As Long, J As Long, K As Long
    BigString = ""

' Add code to sum both "All" and "all" ' Add code to separate "." "!" etc. from the word preceeding them so that word ' is also counted in the total. For example: "all." should not be reported as 1 ' "all." but "all" be added to the total count of "all" words. ' Would you publish this new code?

    For Each r In Selection 
          BigString = BigString & " " & r.Value
    Next r
    BigString = Trim(BigString)
    ary = Split(BigString, " ")
    Dim cl As Collection
    Set cl = New Collection
    For Each a In ary
        On Error Resume Next
        cl.Add a, CStr(a)
    Next a

    For I = 1 To cl.Count
        v = cl(I)
        Cells(I, "B").Value = v
        J = 0
        For Each a In ary
            If a = v Then J = J + 1
        Next a
        Cells(I, "C") = J
    Next I

End Sub

Upvotes: 11

Marston Gould
Marston Gould

Reputation: 21

Using Google Sheets:

index((Transpose(ArrayFormula(QUERY(TRANSPOSE(SPLIT(JOIN(" ",$B$2)," ")&{"";""}),"select Col1, count(Col2) group by Col1 order by count(Col2) desc limit 20 label Col1 'Word', count(Col2) 'Frequency'",0)))),1,$A6+1)&":"&index((Transpose(ArrayFormula(QUERY(TRANSPOSE(SPLIT(JOIN(" ",$B$2)," ")&{"";""}),"select Col1, count(Col2) group by Col1 order by count(Col2) desc limit 20 label Col1 'Word', count(Col2) 'Frequency'",0)))),2,$A6+1)

In the above $B$2 contains the text string

$A6 = 1 will give you the most used word

$A6 = 2 will give you the second most used word etc.

This is set to do 20 most frequent. If you want more, increase the limit value to whatever you want.

Upvotes: 2

WGS
WGS

Reputation: 14179

Given this:

enter image description here

I'll use a pivot table to get this:

enter image description here

Best part is, if I got more, it's easy to get Top 5, 10, etc. And it'll always result to unique indices. From there, there are all manners of editing and calculation you can do. :)

Upvotes: 5

Related Questions