Peter Tomlinson
Peter Tomlinson

Reputation: 13

Excel - standard deviation where source cells are a count

I have some data that looks like this

Condition 1 Condition 2 Condition 3 Condition 4 Condition 5
0 0 0 70 0
0 50 10 0 0
120 0 0 5 5

Where the value in each cell is the number of meters of an asset that is the given condition. Or in other words, a count of the number of meters that are a '4'. How do I calculate a standard deviation for this? Obviously the std.dev would be '0' for the first row, higher for row 2, and fairly low for row 3. Something similar to REPT, but that repeats a value x times in a formula? I considered a helper column but the number of meters that there are in total makes this impractical.

Upvotes: 0

Views: 340

Answers (3)

Peter Tomlinson
Peter Tomlinson

Reputation: 13

I ended up figuring it out. I added a column which calculated the average. (Say column F) I then had a formula like this

=SQRT(SUM(A2*POWER((1-F2),2),B2*POWER((2-F2),2),C2*POWER((3-F2),2),D2*POWER((4-F2),2),E2*POWER((5-F2),2))/SUM(A2:E2))

Essentially this calculated the variance from the mean for each condition value, multiplied by the number of values (e.g. number of meters) of asset that are that particular condition, then did the normal other standard deviation calculations (sum, divide by total, square).

Upvotes: 0

Tom Sharpe
Tom Sharpe

Reputation: 34255

Another way of doing it is to use the alternative SD formula

enter image description here

which would give you

=SQRT((SUM(A2:E2*COLUMN(A2:E2)^2)-SUM(A2:E2*COLUMN(A2:E2))^2/SUM(A2:E2))/SUM(A2:E2))

for the population standard deviation.

The Excel 365 version using Let is more readable I think:

=LET(x,COLUMN(A2:E2),
mpy,A2:E2,
n,SUM(mpy),
sumxsq,SUM(mpy*x^2),
sumsqx,SUM(mpy*x)^2,
numerator,sumxsq-sumsqx/n,
SQRT(numerator/n)
)

A bit less obviously, you could get it from the original formula

enter image description here

=SQRT(SUM(A2:E2*(COLUMN(A2:E2)-SUM(A2:E2*COLUMN(A2:E2))/SUM(A2:E2))^2/SUM(A2:E2)))

Again, in Excel 365 you could write this as:

=LET(x,COLUMN(A2:E2),
mpy,A2:E2,
n,SUM(mpy),
xbar,SUM(mpy*x/n),
numerator,SUM(mpy*(x-xbar)^2),
SQRT(numerator/n)
)

enter image description here

Change the denominator to

(SUM(A2:E2)-1)

for the sample standard deviation.

Upvotes: 1

Terry W
Terry W

Reputation: 3257

I am not a math expert, but I can show you how to "make a range of numbers" based on the criteria shown, using Excel 365.

Suppose your data is in the range B2:F4 as shown below. In cell G2, enter the following formula and drag it down:

=STDEV.P(--FILTERXML("<t><s>"&TEXTJOIN("</s><s>",1,REPT($B$1:$F$1&"</s><s>",$B2:$F2))&"</s></t>","//s[number()=.]"))

The above will calculate the standard deviation using the STDEV.P function, but I am unsure if this is the right function to use as there are many other variations to the original STDEV function.

Regardless, the following part of the formula is able to return a range of numbers as desired:

=--FILTERXML("<t><s>"&TEXTJOIN("</s><s>",1,REPT($B$1:$F$1&"</s><s>",$B2:$F2))&"</s></t>","//s[number()=.]")

You can view this question and the answer by JvdV to understand the use of the FILTERXML function.

sample

Upvotes: 2

Related Questions