Reputation: 13
I'm an intermediate excel user trying to solve an issue that feels a little over my head. Basically, I'm working with a spreadsheet which contains a number of orders associated with customer account #s and which have up to 5 metadata "tags" associated with them. I want to be use that customer account # to pull the 5 most commonly occurring metadata tags in order.
Here is a mock up of the first set of data
Account Number Order Number Metadata
5043 1 A B C D
4350 2 B D
4350 3 B C
5043 4 A D
5043 5 C D
1204 6 A B
5043 7 A D
1204 8 D B
4350 9 B D
5043 10 A C D
and the end result I'm trying to create
Account Number Most Common Tag 2nd 3rd 4th 5th
5043 A C B N/A
4350 B D C N/A N/A
1204 B A C N/A N/A
I was trying to work with the formula suggested here:
=ARRAYFORMULA(INDEX(A1:A7,MATCH(MAX(COUNTIF(A1:A7,A1:A7)),COUNTIF(A1:A7,A1:A7),0)))
But I don't know how to a) use the customer account # as a precondition for counting the text values within the range. b) how to circumvent the fact that the Match forumula only wants to work with a single column of data and c) how to read the 2nd, 3rd, 4th, and 5th most common values from this range.
The way I'm formatting this data isn't set in stone. I suspect the way I'm organizing this information is holding me back from simpler solutions, so any suggestions on re-thinking my organization would be just as helpful as insights on how to create a formula to do this.
Upvotes: 1
Views: 1238
Reputation: 18707
You also could get this report:
Account Number Tag count
1204 B 2
1204 A 1
1204 D 1
4350 B 3
4350 D 2
4350 C 1
5043 D 5
5043 A 4
5043 C 3
5043 B 1
with the formula:
=QUERY(
{TRANSPOSE(SPLIT(JOIN("",ArrayFormula(REPT(FILTER(A2:A,A2:A<>"")&",",5))),",")),
TRANSPOSE(SPLIT(ArrayFormula(CONCATENATE(FILTER(C2:G,A2:A<>"")&" ,")),",")),
TRANSPOSE(SPLIT(rept("1,",counta(A2:A)*5),","))
},
"select Col1, Col2, Count(Col3) where Col2 <>' ' group by Col1, Col2
order by Col1, Count(Col3) desc label Col1 'Account Number', Col2 'Tag'")
The formula will count the number of occurrences of any tag.
Upvotes: 1
Reputation:
Implementing this kind of frequency analysis using built-in functions is likely to be a frustrating exercise. Since you are working with Google Sheets, take advantage of the custom functions, written in JavaScript and placed into a script bound to the sheet (Tools > Script Editor).
The function I wrote for this purpose is below. Entering something like =tagfrequency(A2:G100)
in the sheet will produce desired output:
+----------------+-----------------+-----+-----+-----+-----+
| Account Number | Most Common Tag | 2nd | 3rd | 4th | 5th |
| 5043 | D | A | C | B | N/A |
| 4350 | B | D | C | N/A | N/A |
| 1204 | B | A | D | N/A | N/A |
+----------------+-----------------+-----+-----+-----+-----+
function tagFrequency(arr) {
var dict = {}; // the object in which to store tag counts
for (var i = 0; i < arr.length; i++) {
var acct = arr[i][0];
if (acct == '') {
continue; // ignore empty rows
}
if (!dict[acct]) {
dict[acct] = {}; // new account number
}
for (var j = 2; j < arr[i].length; j++) {
var tag = arr[i][j];
if (tag) {
if (!dict[acct][tag]) {
dict[acct][tag] = 0; // new tag
}
dict[acct][tag]++; // increment tag count
}
}
}
// end of recording, begin sorting and output
var output = [['Account Number', 'Most Common Tag', '2nd', '3rd', '4th', '5th']];
for (acct in dict) {
var tags = dict[acct];
var row = [acct].concat(Object.keys(tags).sort(function (a,b) {
return (tags[a] < tags[b] ? 1 : (tags[a] > tags[b] ? -1 : (a > b ? 1 : -1)));
})); // sorting by tag count, then tag name
while (row.length < 6) {
row.push('N/A'); // add N/A if needed
}
output.push(row); // add row to output
}
return output;
}
Upvotes: 1