user29853
user29853

Reputation:

How to calculate average of a column of numbers linked to each frequency bin making up a histogram, Excel 2010?

I have three columns. Column A consists of numbers, column B consists of bin ranges, and column C consists of number data relevant to the individual data in column A.

Using columns A and B, I created a frequency histogram where all the data in column A have been grouped into the bins of column B. I would like to calculate the average value of each bin using the data from column C (i.e., calculate a mean value for each bin using data from column C that is associated to each value (from column A) that made up each bin).

Can anybody help?

Thanks for the replies. Here is an example of the data (Unfortunately I can not paste in images):

Below are three columns with headers Jar Type (in volume (ml)), Cookies (he number of chocolate chip cookies in the jar), and Interval for bins (bins to count the jar types):

  1. Jar type-cookies-intervals for bins
  2. 500 3 100
  3. 500 1 150
  4. 500 0.5 200
  5. 250 3 250
  6. 150 1 300
  7. 500 1 350
  8. 150 2 400
  9. 250 2 450
  10. ### # 500

Making a histogram of the frequency of jar types gives this grouping:

  1. Bin-Frequency
  2. 100 0
  3. 150 2
  4. 200 0
  5. 250 2
  6. 300 0
  7. 350 0
  8. 400 0
  9. 450 0
  10. 500 4
  11. More 0

Now what I am trying to do is to find out what is the mean number of cookies that can be found in each type of jar. For example, for the 500ml we know that there are 4x500ml jars, and that in each of the 500ml we have 3+1+0.5+1 = 5.5 cookies in total. the mean would be 1.735 cookies.

My issue is that I have 5000+ numbers that separate into 100 bins.

Upvotes: 2

Views: 5595

Answers (1)

whuber
whuber

Reputation: 2494

The question calls for a "wandering trace" of a scatterplot: the values of column A (plot them on the horizontal axis) are placed into bins, which therefore comprise vertical strips in the scatterplot. The values of column C (plotted on the vertical axis) are averaged within each strip. This technique smooths out and summarizes apparent trends in the scatterplot.

Spreadsheet

In this example with 100 records the original data are in black and computed values are in green. Here is the wandering trace of means:

Plot

The open circles plot column C (associated values) against column A (data) while the solid squares, connected with a dashed red trace, plot the bin means (column G) against the midpoints (column F).

Any statistical package will provide functions for grouping data and performing operations on those groups. Excel does this to a limited extent with its SUMIF and COUNTIF functions. To use them, create a column (D in the spreadsheet) showing the grouping factor. (That's a simple lookup in the sorted BINS vector using the VLOOKUP function with its "range" option set to true.) SUMIF computes sums by group factor and COUNTIF counts by group factor. Their ratios are the bin means.

Here is what the formulas look like:

Formulas

Only three formulas were actually entered and then copied down as needed:

  • =VLOOKUP(A2, Bins, 1, TRUE) computes the group for the value in cell A2. Bins a name for the array $(-2,-3, \ldots, 3)$ in column B.

  • =AVERAGE(B3:B4) computes the midpoint of the first bin. This was used as a horizontal plotting position in the scatterplot.

  • =SUMIF(Bin,"="&B3,NewValues)/COUNTIF(Bin, "="&B3) is where all the work is done. Bin refers to the group codes in column D and NewValues refers to the associated values in column C. The tricky parts are the constructs "="&B3: these form a text value instructing the data to be grouped by comparison to the number in cell B3, which is the first endpoint. Because this is a formula, copying it down automatically updates the B3 to B4, then B5, etc.

Upvotes: 3

Related Questions