acemtp
acemtp

Reputation: 3019

Algorithm to find average of group of numbers

I have a quite small list of numbers (a few hundred max) like for example this one:

117 99 91 93 95 95 91 97 89 99 89 99 91 95 89 99 89 99 89 95 95 95 89 948 189 99 89 189 189 95 186 95 93 189 95 189 89 193 189 93 91 193 89 193 185 95 89 194 185 99 89 189 95 189 189 95 89 189 189 95 189 95 89 193 101 180 189 95 89 195 185 95 89 193 89 193 185 99 185 95 189 95 89 193 91 190 94 190 185 99 89 189 95 189 189 95 185 95 185 99 89 189 95 189 186 99 89 189 191 95 185 99 89 189 189 96 89 193 189 95 185 95 89 193 95 189 185 95 93 189 189 95 186 97 185 95 189 95 185 99 185 95 185 99 185 95 190 95 185 95 95 189 185 95 189 2451

If you create a graph with X=the number and Y=number of times we see the number, we'll have something like this: Distribution

What I want is to know the average number of each group of numbers. In the example, there's 4 groups and the resulting numbers are 92, 187, 948 and 2451

The number of groups of number is not known.

Do you have any idea of how to create a (simple if possible) algorithm do extract these resulting numbers (if possible in c or pseudo code or English :)

Upvotes: 2

Views: 6989

Answers (5)

Phoenix
Phoenix

Reputation: 4536

In PHP you could do it like this:

$array = array(//an array of numbers);

$average = array_sum($array) / count($array);

With multiple groups of numbers you can do something like:

$array = array(
               array(array of numbers, group1),
               array(array of numbers, group2),
               //etc.
              );

foreach($array as $numbers)
{
     $average[] = array_sum($numbers) / count($numbers);
}

Unless you're looking for the median or mode.

Ah, I see what you're asking now, you're not asking how to find the average, you're asking how to group the numbers up and find the average of each group.

Lets see, you'd have to find the mode, $counts = array_count_values($array)); array_keys(max($counts)); will do that and the keys in $counts will be the values of the original array, with the values in $counts being the number of times that each number shows up. Then you need to figure out where the bigger gaps in the keys in $counts are. You could also array_unique() the array original array and find the gaps in the values.

Wish my statistics teacher had done a bit more than play poker with us, or I could probably figure out the exact statistical method to determine how big the range checked to determine the groups should be.

Upvotes: 0

academicRobot
academicRobot

Reputation: 6107

What you want to do is called clustering. If the data you've shown is typical, a gready approach, such as neighbor joining, should be sufficient. So the procedure is:

1) Apply neighbor joining
2) Apply an (empirically identified) threshold to define the clusters
3) Calculate average of each cluster

Using a package that already has clustering algorithms, such as R, would probably be the easiest course, though neighbor joining is not a particularly hard algorithm.

Upvotes: 4

Klas Lindbäck
Klas Lindbäck

Reputation: 33273

So you are looking for "spikes" in the graph. I'm guessing you are interested in the size and position of each group?

You might use something like this:

Sort the numbers
Loop:
  Take the highest number you have 
  Investigate more numbers until you find a number that is too small to belong to the group (maybe 5% smaller)
  Calculate the average of the selected numbers 
  Let the discarded number be the last number
End loop

Upvotes: 0

AVH
AVH

Reputation: 11516

Here's a way:

  1. Decide what width your bins will be. Let's say 10 (i.e. e.g. numbers > -5 and <= 5 go into bin 0, numbers > 5 and <= 15 go into bin 1, ...).
  2. Create a list which holds lists to the number in each bin. I'd go with something like map<unsigned int, vector<unsigned int> * > in C++.
  3. Now iterate over the numbers, decide what bin they belong to. Check if there's already a vector for this bin in your map, if not create one. Add the number to the vector.
  4. After iterating over all the numbers, simply calculate the average of each vector.

Upvotes: 0

Sarfaraz Nawaz
Sarfaraz Nawaz

Reputation: 361362

I think std::map<int,int> can easily solve this problem. The key of the map would be the number, and value would be the times/frequency the number occurs.

So the average can be calculated as,

int average = (m[key] * key) / count;

Where count is total number of numbers, so it calculates the average of each group over all numbers, as you didn't clearly mention what you mean by average. I'm also assuming that each distinct number forms its own group!

Upvotes: 0

Related Questions