Reputation: 132
I'm creating a set of formulas to analyze different sets of json data. I would like to show the uniqueness for each field in the dataset and the top 3 values per field. The json data is pasted on one of the sheets, and the results of my analyses are shown on a different sheet.
An example of some arbitrary raw data:
For this dataset I can create the following formulas (all similar coloured cells are matrix formulas):
Cell A1 contains a formula that dynamically returns all headers (yellow). If the pasted data contains more fields, this list expands automatically. The pink area also grows or shrinks based on the amount of records and fields in the raw data.
What I would like to know is how to setup the following formulas:
An example of the header formula (yellow):
=LET(SUB,INDIRECT("A8:"&ADDRESS(8,number_of_fields)),SUBSTITUTE(SUBSTRING(SUB,1,FIND(":",SUB)-1),"""","")
(formula translated from dutch syntax)
I know how to manually copy the formulas over, but I'm sure it's possible to convert this into a matrix formula. For example, is there a function like Repeat, but for formulas repeating for x amount of cells?
Edit after answer: Getting close! The top 3 is almost working as intended. The answer below creates the following result on a more complex dataset:
It sometimes leaves a cell empty in the top 3 for that column. Preferably the top 3 values bubble up to the top, where it populates row 2 and 3 if the column only contains 2 variations.
Upvotes: 0
Views: 185
Reputation: 11628
Maybe a little too literal, but the following formula will spill the top 3 and the splitted data as shown in the picture
=LET(data,TRIM(Sheet1!A1:A9),
f,FILTER(data,LEFT(data,1)=""""),
split,DROP(REDUCE(0,f,LAMBDA(a,b,VSTACK(a,TEXTSPLIT(b,",")))),1),
header,SUBSTITUTE(TEXTSPLIT(TAKE(split,1),":"),"""",""),
s,SEQUENCE(1,COLUMNS(split)),
count,DROP(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,MMULT(--(TRANSPOSE(INDEX(split,,b))=INDEX(split,,b)),SEQUENCE(ROWS(f),,1,0))))),,1),
comb,split&" ("&count&")",
allunique,DROP(IFERROR(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,UNIQUE(INDEX(comb,,b))))),""),,1),
fq,DROP(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,ROWS(f)-FREQUENCY(XMATCH(INDEX(split,,b),INDEX(split,,b)),XMATCH(INDEX(split,,b),INDEX(split,,b)))))),-1,1),
_top3,TAKE(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,SORTBY(INDEX(allunique,,b),INDEX(fq,,b),1)))),3,-COLUMNS(split)),
IFERROR(VSTACK(header,_top3,"","",split),""))
split
is all data (below),
_top3
is the top 3 of the frequency of the text per column.
You may only need the _top3
data though..
If I'm not mistaken, this would be the Dutch variant:
=LET(data;SPATIES.WISSEN(A1:A9);
f;FILTER(data;LINKS(data;1)="""");
split;WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);
header;SUBSTITUEREN(TEKST.SPLITSEN(NEMEN(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);1);":");"""";"");
s;REEKS(1;KOLOMMEN(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1)));
count;WEGLATEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;PRODUCTMAT(--(TRANSPONEREN(INDEX(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);;b))=INDEX(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);;b));REEKS(RIJEN(f);;1;0)))));;1);
comb;split&" ("&count&")";
allunique;WEGLATEN(ALS.FOUT(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;UNIEK(INDEX(comb;;b)))));"");;1);
fq;WEGLATEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;RIJEN(f)-INTERVAL(X.VERGELIJKEN(INDEX(split;;b);INDEX(split;;b));X.VERGELIJKEN(INDEX(split;;b);INDEX(split;;b))))));-1;1);
_top3;NEMEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;SORTEREN.OP(INDEX(allunique;;b);INDEX(fq;;b);1))));3;-KOLOMMEN(split));
ALS.FOUT(VERT.STAPELEN(header;_top3;"";"";split);""))
(I'm Dutch, but I'm not familiar with the Dutch equivalents of the newer functions, since I work with English version and support is contradicting in some times:
NEMEN
might be TAKE
, since it's listed as NEMEN
here https://support.microsoft.com/nl-nl/office/excel-functies-alfabetisch-b3944572-255d-4efb-bb96-c6d90033e188#bm14, but if you click for it, it shows explanation for TAKE
in Dutch (https://support.microsoft.com/nl-nl/office/take-functie-25382ff1-5da1-4f78-ab43-f33bd2e4e003) ).
Edit:
To "drop" the trailing boolean column you can add another condition to DROP
(WEGLATEN
):
WEGLATEN([data],1,-1)
this means dropping the first row of the data (condition 1
) and it's last column (condition -1
):
=LET(data;SPATIES.WISSEN(A1:A9);
f;FILTER(data;LINKS(data;1)="""");
split;WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1;-1);
header;SUBSTITUEREN(TEKST.SPLITSEN(NEMEN(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);1);":");"""";"");
s;REEKS(1;KOLOMMEN(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1)));
count;WEGLATEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;PRODUCTMAT(--(TRANSPONEREN(INDEX(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);;b))=INDEX(WEGLATEN(REDUCE(0;f;LAMBDA(a;b;VERT.STAPELEN(a;TEKST.SPLITSEN(b;","))));1);;b));REEKS(RIJEN(f);;1;0)))));;1);
comb;split&" ("&count&")";
allunique;WEGLATEN(ALS.FOUT(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;UNIEK(INDEX(comb;;b)))));"");;1);
fq;WEGLATEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;RIJEN(f)-INTERVAL(X.VERGELIJKEN(INDEX(split;;b);INDEX(split;;b));X.VERGELIJKEN(INDEX(split;;b);INDEX(split;;b))))));-1;1);
_top3;NEMEN(REDUCE(0;s;LAMBDA(a;b;HOR.STAPELEN(a;SORTEREN.OP(INDEX(allunique;;b);INDEX(fq;;b);1))));3;-KOLOMMEN(split));
ALS.FOUT(VERT.STAPELEN(header;_top3;"";"";split);""))
And to cope with columns where there's less than 3 top ranked values:
=LET(data,TRIM(Sheet1!A1:A9),
f,FILTER(data,LEFT(data,1)=""""),
split,DROP(REDUCE(0,f,LAMBDA(a,b,VSTACK(a,TEXTSPLIT(b,",")))),1),
header,SUBSTITUTE(TEXTSPLIT(TAKE(split,1),":"),"""",""),
s,SEQUENCE(1,COLUMNS(split)),
count,DROP(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,MMULT(--(TRANSPOSE(INDEX(split,,b))=INDEX(split,,b)),SEQUENCE(ROWS(f),,1,0))))),,1),
comb,split&" ("&count&")",
allunique,DROP(IFERROR(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,UNIQUE(INDEX(comb,,b))))),""),,1),
fq,DROP(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,ROWS(f)-FREQUENCY(XMATCH(INDEX(split,,b),INDEX(split,,b)),XMATCH(INDEX(split,,b),INDEX(split,,b)))))),-1,1),
_top3,TAKE(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,SORTBY(INDEX(allunique,,b),INDEX(fq,,b),1)))),3,-COLUMNS(split)),
_top3minus,DROP(IFERROR(REDUCE(0,s,LAMBDA(a,b,HSTACK(a,FILTER(INDEX(_top3,,b),INDEX(_top3,,b)<>"")))),""),,1),
IFERROR(VSTACK(header,_top3minus,"","",split),""))
Upvotes: 1