user1580957
user1580957

Reputation:

Finding standard deviation using only mean, min, max?

I want to find the standard deviation:

Minimum = 5
Mean = 24
Maximum = 84

Overall score = 90

I just want to find out my grade by using the standard deviation

Thanks,

Upvotes: 2

Views: 60435

Answers (5)

Russ Thils
Russ Thils

Reputation: 150

With some very strict assumptions that you are 100% confident you have the highest and lowest value for a given question/scenario without knowing the other values (like you know the highest and lowest scores for a given exam), AND.... you are assuming that the scores follow a normal distribution, then you can get a fairly close estimate of the standard deviation by multiplying the difference of the max and min by 0.997 and then dividing by 6, since 6 sigma (i.e 6 standard deviations) = ~99.7% of the population FOR a Normal Distribution.

StdDev = ((max - min)*.997)/6

This is a much better approximation than the rule of thumb mentioned by many to divide the range by 4, because 4 sigma is approximately 95% of the range of a population, sooooo it is fairly far away from the 2 endpoints of the range that you (as the statistician) have attested are the absolute min and max (i.e. 100%, not 95%) for that particular scenario.

You could use the same trick I employed above to multiply the full range by .95 and then divide by 4, but that would be better suited for a scenario where someone told you that 95% of the population lies within the range of 2 values. For a situation where you know the hard max and min of the full range, why use a less accurate approximation when you are certain you have 100% of the range and .997 is within 0.3% of the full range? Reducing the range (i.e. by multiplying it .997) gives you the range that fits exactly within 6 sigma (or 3 Standard Deviations from the average), so that you can divide it exactly by 6 and have a nearly exact value of the standard deviation (at least without analyzing the rest of the dataset).

Note: the assumptions are this is a scenario where you are not trying to tease out based on probability and sample size the underlying statistics of a random variable that may produce future sample sets; your assumption for using this shortcut is that you know the entirety of the data produced (the full population) & therefore have an exact min/max & you know or expect normal distribution characteristics apply to that data (or reasonably enough for the analysis you desire). However, the mean you've given is not equidistance from the min & max, which indicates you do not have a normal distribution -- it's probably something closer to a log-normal or a Weibull distribution; additionally, something is not right if your overall score is greater than the max of the sample.

Upvotes: 1

javivr
javivr

Reputation: 89

Yo can obtain an estimate of the geometric mean, sometimes called the geometric mean of the extremes or GME, using the Min and the Max by calculating the GME= $\sqrt{ Min*Max }$. The SD can be then calculated using your arithmetic mean (AM) and the GME as:

formula

SD= $$\frac{AM}{GME} * \sqrt{(AM)^2-(GME)^2 }$$

This approach works well for log-normal distributions or as long as the GME, GM or Median is smaller than the AM.

Upvotes: 1

Eponymous
Eponymous

Reputation: 6831

I actually did a quick-and-dirty calculation of the type M Rad mentions. It involves assuming that the distribution is Gaussian or "normal." This does not apply to your situation but might help others asking the same question. (You can tell your distribution is not normal because the distance from mean to max and mean to min is not close). Even if it were normal, you would need something you don't mention: the number of samples (number of tests taken in your case).

Those readers who DO have a normal population can use the table below to give a rough estimate by dividing the difference of your measured minimum and your calculated mean by the expected value for your sample size. On average, it will be off by the given number of standard deviations. (I have no idea whether it is biased - change the code below and calculate the error without the abs to get a guess.)

    Num Samples   Expected distance      Expected error
             10                1.55                0.25
             20                1.88                0.20
             30                2.05                0.18
             40                2.16                0.17
             50                2.26                0.15
             60                2.33                0.15
             70                2.38                0.14
             80                2.43                0.14
             90                2.47                0.13
            100                2.52                0.13

This experiment shows that the "rule of thumb" of dividing the range by 4 to get the standard deviation is in general incorrect -- even for normal populations. In my experiment it only holds for sample sizes between 20 and 40 (and then loosely). This rule may have been what the OP was thinking about.

You can modify the following python code to generate the table for different values (change max_sample_size) or more accuracy (change num_simulations) or get rid of the limitation to multiples of 10 (change the parameters to xrange in the for loop for idx)

#!/usr/bin/python
import random

# Return the distance of the minimum of samples from its mean
#
# Samples must have at least one entry
def min_dist_from_estd_mean(samples):
    total = 0
    sample_min = samples[0]
    for sample in samples:
        total += sample
        sample_min = min(sample, sample_min)
    estd_mean = total / len(samples)
    return estd_mean - sample_min # Pos bec min cannot be greater than mean


num_simulations = 4095
max_sample_size = 100

# Calculate expected distances
sum_of_dists=[0]*(max_sample_size+1) # +1 so can index by sample size
for iternum in xrange(num_simulations):
    samples=[random.normalvariate(0,1)]
    while len(samples) <= max_sample_size:
        sum_of_dists[len(samples)] += min_dist_from_estd_mean(samples)
        samples.append(random.normalvariate(0,1))
expected_dist = [total/num_simulations for total in sum_of_dists]

# Calculate average error using that distance
sum_of_errors=[0]*len(sum_of_dists)
for iternum in xrange(num_simulations):
    samples=[random.normalvariate(0,1)]
    while len(samples) <= max_sample_size:
        ave_dist = expected_dist[len(samples)]
        if ave_dist > 0:
            sum_of_errors[len(samples)] += \
                abs(1 - (min_dist_from_estd_mean(samples)/ave_dist))
        samples.append(random.normalvariate(0,1))
expected_error = [total/num_simulations for total in sum_of_errors]

cols="    {0:>15}{1:>20}{2:>20}"
print(cols.format("Num Samples","Expected distance","Expected error"))
cols="    {0:>15}{1:>20.2f}{2:>20.2f}"
for idx in xrange(10,len(expected_dist),10):
    print(cols.format(idx, expected_dist[idx], expected_error[idx]))

Upvotes: 1

bames53
bames53

Reputation: 88215

A standard deviation cannot in general be computed from just the min, max, and mean. This can be demonstrated with two sets of scores that have the same min, and max, and mean but different standard deviations:

  • 1 2 4 5 : min=1 max=5 mean=3 stdev≈1.5811
  • 1 3 3 5 : min=1 max=5 mean=3 stdev≈0.7071

Also, what does an 'overall score' of 90 mean if the maximum is 84?

Upvotes: 10

M Rad
M Rad

Reputation: 56

In principle you can make an estimate of standard deviation from the mean/min/max and the number of elements in the sample. The min and max of a sample are, if you assume normality, random variables whose statistics follow from mean/stddev/number of samples. So given the latter, one can compute (after slogging through the math or running a bunch of monte carlo scripts) a confidence interval for the former (like it is 80% probable that the stddev is between 20 and 40 or something like that).

That said, it probably isn't worth doing except in extreme situations.

Upvotes: 0

Related Questions