Reputation: 527
Let's say I define 10 as being a close enough difference between two values, what I want is the average of all the values that are close enough to each other (or in other words, grouped by their closeness). So, if I have a table with the following values:
+-------+
| value |
+-------+
| 1 |
| 1 |
| 2 |
| 4 |
| 2 |
| 1 |
| 4 |
| 3 |
| 22 |
| 23 |
| 24 |
| 22 |
| 20 |
| 19 |
| 89 |
| 88 |
| 86 |
+-------+
I want a query that would output the following result:
+---------+
| 2.2500 |
| 21.6667 |
| 87.6667 |
+---------+
Where 2.2500 would be produced as the average of all the values ranging from 1 to 4 since they're for 10 or less away from each other. In the same way, 21.6667 would be the average of all the values ranging from 19 to 24, and 87.6667 would be the average of all the values ranging from 86 to 89.
Where my specified difference of what is currently 10, would have to be variable.
Upvotes: 0
Views: 234
Reputation: 1270493
This isn't so bad. You want to implement the lag()
function in MySQL to determine if a value is the start of a new set of rows. Then you want a cumulative sum of this value to identify a group.
The code looks painful, because in MySQL you need to do this with correlated subqueries and join/aggregation rather than with ANSI standard functions, but this is what it looks like:
select min(value) as value_min, max(value) as value_max, avg(value) as value_avg
from (select t.value, count(*) as GroupId
from table t join
(select value
from (select value,
(select max(value)
from table t2
where t2.value < t.value
) as prevValue
from table t
) t
where value - prevvalue < 10
) GroupStarts
on t.value >= GroupStarts.value
group by t.value
) t
group by GroupId;
The subquery GroupStarts
is finding the break points, that is, the set of values that differ by 10 or more from the previous value. The next level uses join/aggregation to count the number of such break points before any given value. The outermost query then aggregation using this GroupId
.
Upvotes: 1
Reputation: 117
At first, I would export to an array the whole result.
Afterwards, use a function
function show(elements_to_agroup=4)
{
for (i = 0; i < count(array) ; i++)
{
sum = 0;
if (i % elements_to_agroup)
{
sum = sum / elements_to_agroup;
return sum;
}
else
{
sum =+ array[i];
}
}
}
Upvotes: 0
Reputation: 812
I like the other user's suggestion to create a hash column. Joining to yourself has an exponential effect, and should be avoided.
One other possibility is to use /, for example select avg(val), val/10 from myTable group by val/10 would have a value of group that is 0 for 0-9, 1 for 10-19, etc.
At least, it works in SQL Server that way
Upvotes: 0
Reputation: 3582
Create another column with a hash value for the field. This field will be used to test for equality. For example with strings you may store a soundex. For numbers you may store the closest multiple of ten
Otherwise doing a calculation will be much slower. You could also cross join the table to itself and group where the difference of the two fields < 10
Upvotes: 0