JLDN Admin
JLDN Admin

Reputation: 131

Median and mode in excel using multiple columns

I am working on a spreadsheet for my statistics class. It contains a column filled with a quantity and a column filled with a frequency for each quantity. Using these two columns and several formulas, the mean, mode, median, coefficient variation, quartiles, interquartile range, minimum and maximum values and standard deviation can be found.

The issue is that to shorten the list of quantities, the use of the frequencies column comes in. Take the quantity list of : 10, 12, 14, 15, 18, 18, 10.

Notice that 18 and 10 have been repeated. Instead of repeating those quantities in my quantities column, I increment the frequency of that quantity from 1 to 2. The reason for doing it this way is due to very large lists of quantity data. Meaning I might have 5000 quantities that could potentially be reduced to 20 quantities where each quantity will have a larger frequency.

The issue is that the built-in Median and Mode function use only 1 column or row and do not account for the frequency of each quantity.

My question is this: Is there a way to use both columns to calculate the median and mode? Below is an example of what I was explaining. Additionally, my column containing quantities and frequencies may not be filled from start to end, meaning there maybe empty entries.

Quantity  |  Frequency
  10      |     4
  12      |     6
  11      |     3
  15      |     1
  18      |     10

VS.

Quantity
10
10
10
10
12
12
12
12
12
12
11
11
11
15
18
18
18
18
18
18
18
18
18
18

Upvotes: 2

Views: 2041

Answers (2)

Chronocidal
Chronocidal

Reputation: 7951

Assuming that each Quantity only appears once alongside its Frequency, then you could use an INDEX MATCH on the MAX Frequency to find the Mode:

=INDEX(A2:A6,Match(Max(B2:B6), B2:B6, 0))

This will, of course, only return the first number should there be multiple Quantities which share the same maximum frequency (such as if Quantity 12 also occurred with Frequency 10)

Further assuming that your Quantities are in Ascending Order, you can use Matrix Multiplication (MMULT) to calculate the running total for each row, and use AGGREGATE to get the smallest row where the running total greater-than-or-equal-to half the overall total. This is the Median:

=AGGREGATE(15, 6, A2:A6/(MMULT(--(TRANSPOSE(ROW(B2:B6))<=ROW(B2:B6)), B2:B6)>=0.5*SUM(B2:B6)), 1)

(For reference, MMULT(--(TRANSPOSE(ROW(B2:B6))<=ROW(B2:B6)), B2:B6) is the bit which works out the Running total)

Upvotes: 2

Scott Craner
Scott Craner

Reputation: 152505

If one has Office 365 with the dynamic array formula use:

=MEDIAN(INDEX(A2:A6,MATCH(SEQUENCE(SUM(B2:B6),,0),SUMIF(OFFSET(B1,0,,ROW(B2:B6)-MIN(ROW(B2:B6))+1,),"<>"))))

and

=MODE.SNGL(INDEX(A2:A6,MATCH(SEQUENCE(SUM(B2:B6),,0),SUMIF(OFFSET(B1,0,,ROW(B2:B6)-MIN(ROW(B2:B6))+1,),"<>"))))

enter image description here


If not then this array formula:

=MEDIAN(INDEX(A2:A6,MATCH(ROW($ZZ1:INDEX($ZZ:$ZZ,SUM(B2:B6)))-1,SUMIF(OFFSET(B1,0,,ROW(B2:B6)-MIN(ROW(B2:B6))+1,),"<>"))))

and

=MODE.SNGL(INDEX(A2:A6,MATCH(ROW($ZZ1:INDEX($ZZ:$ZZ,SUM(B2:B6)))-1,SUMIF(OFFSET(B1,0,,ROW(B2:B6)-MIN(ROW(B2:B6))+1,),"<>"))))

Being array formula, then need to be confirmed with Ctrl-Shift-Enter instead of Enter when exiting edit mode.

enter image description here

Upvotes: 5

Related Questions