thebad69
thebad69

Reputation: 13

How to list most frequent text values within a range?

I'm an intermediate excel user trying to solve an issue that feels a little over my head. Basically, I'm working with a spreadsheet which contains a number of orders associated with customer account #s and which have up to 5 metadata "tags" associated with them. I want to be use that customer account # to pull the 5 most commonly occurring metadata tags in order.

Here is a mock up of the first set of data

Account Number  Order Number    Metadata            
5043                  1         A   B   C   D
4350                  2         B   D       
4350                  3         B   C       
5043                  4         A   D       
5043                  5         C   D       
1204                  6         A   B       
5043                  7         A   D       
1204                  8         D   B       
4350                  9         B   D       
5043                 10         A   C   D   

and the end result I'm trying to create

Account Number  Most Common Tag 2nd 3rd 4th 5th
5043                  A         C   B   N/A
4350                  B         D   C   N/A N/A
1204                  B         A   C   N/A N/A

I was trying to work with the formula suggested here:

=ARRAYFORMULA(INDEX(A1:A7,MATCH(MAX(COUNTIF(A1:A7,A1:A7)),COUNTIF(A1:A7,A1:A7),0)))

But I don't know how to a) use the customer account # as a precondition for counting the text values within the range. b) how to circumvent the fact that the Match forumula only wants to work with a single column of data and c) how to read the 2nd, 3rd, 4th, and 5th most common values from this range.

The way I'm formatting this data isn't set in stone. I suspect the way I'm organizing this information is holding me back from simpler solutions, so any suggestions on re-thinking my organization would be just as helpful as insights on how to create a formula to do this.

Upvotes: 1

Views: 1238

Answers (2)

Max Makhrov
Max Makhrov

Reputation: 18707

You also could get this report:

Account Number     Tag    count 
          1204      B         2
          1204      A         1
          1204      D         1
          4350      B         3
          4350      D         2
          4350      C         1
          5043      D         5
          5043      A         4
          5043      C         3
          5043      B         1

with the formula:

    =QUERY(
     {TRANSPOSE(SPLIT(JOIN("",ArrayFormula(REPT(FILTER(A2:A,A2:A<>"")&",",5))),",")),
      TRANSPOSE(SPLIT(ArrayFormula(CONCATENATE(FILTER(C2:G,A2:A<>"")&" ,")),",")),
      TRANSPOSE(SPLIT(rept("1,",counta(A2:A)*5),","))
     },
   "select Col1, Col2, Count(Col3) where Col2 <>' ' group by Col1, Col2
    order by Col1, Count(Col3) desc label Col1 'Account Number', Col2 'Tag'")

The formula will count the number of occurrences of any tag.

Upvotes: 1

user3717023
user3717023

Reputation:

Implementing this kind of frequency analysis using built-in functions is likely to be a frustrating exercise. Since you are working with Google Sheets, take advantage of the custom functions, written in JavaScript and placed into a script bound to the sheet (Tools > Script Editor).

The function I wrote for this purpose is below. Entering something like =tagfrequency(A2:G100) in the sheet will produce desired output:

+----------------+-----------------+-----+-----+-----+-----+
| Account Number | Most Common Tag | 2nd | 3rd | 4th | 5th |
| 5043           | D               | A   | C   | B   | N/A |
| 4350           | B               | D   | C   | N/A | N/A |
| 1204           | B               | A   | D   | N/A | N/A |
+----------------+-----------------+-----+-----+-----+-----+

Custom function

function tagFrequency(arr) {
  var dict = {};                   // the object in which to store tag counts
  for (var i = 0; i < arr.length; i++) {
    var acct = arr[i][0];
    if (acct == '') {
      continue;                    // ignore empty rows
    }
    if (!dict[acct]) {
      dict[acct] = {};             // new account number
    }
    for (var j = 2; j < arr[i].length; j++) {
      var tag = arr[i][j];
      if (tag) {
        if (!dict[acct][tag]) {
          dict[acct][tag] = 0;      // new tag 
        }
        dict[acct][tag]++;          // increment tag count
      }
    }
  }
                    //  end of recording, begin sorting and output 
  var output = [['Account Number', 'Most Common Tag', '2nd', '3rd', '4th', '5th']];
  for (acct in dict) {
    var tags = dict[acct];
    var row = [acct].concat(Object.keys(tags).sort(function (a,b) {
      return (tags[a] < tags[b] ? 1 : (tags[a] > tags[b] ? -1 : (a > b ? 1 : -1)));
    }));                        // sorting by tag count, then tag name
    while (row.length < 6) {
      row.push('N/A');          // add N/A if needed 
    }
    output.push(row);           // add row to output 
  }
  return output;  
}

Upvotes: 1

Related Questions