3Nex
3Nex

Reputation: 527

How to group values from a table if they're close?

Let's say I define 10 as being a close enough difference between two values, what I want is the average of all the values that are close enough to each other (or in other words, grouped by their closeness). So, if I have a table with the following values:

+-------+
| value |
+-------+
|     1 |
|     1 |
|     2 |
|     4 |
|     2 |
|     1 |
|     4 |
|     3 |
|    22 |
|    23 |
|    24 |
|    22 |
|    20 |
|    19 |
|    89 |
|    88 |
|    86 |
+-------+

I want a query that would output the following result:

+---------+
| 2.2500  |
| 21.6667 |
| 87.6667 |
+---------+

Where 2.2500 would be produced as the average of all the values ranging from 1 to 4 since they're for 10 or less away from each other. In the same way, 21.6667 would be the average of all the values ranging from 19 to 24, and 87.6667 would be the average of all the values ranging from 86 to 89.

Where my specified difference of what is currently 10, would have to be variable.

Upvotes: 0

Views: 234

Answers (4)

Gordon Linoff
Gordon Linoff

Reputation: 1270493

This isn't so bad. You want to implement the lag() function in MySQL to determine if a value is the start of a new set of rows. Then you want a cumulative sum of this value to identify a group.

The code looks painful, because in MySQL you need to do this with correlated subqueries and join/aggregation rather than with ANSI standard functions, but this is what it looks like:

select min(value) as value_min, max(value) as value_max, avg(value) as value_avg
from (select t.value, count(*) as GroupId
      from table t join
           (select value
            from (select value,
                         (select max(value)
                          from table t2
                          where t2.value < t.value
                         ) as prevValue
                  from table t
                 ) t
            where value - prevvalue < 10
           ) GroupStarts
           on t.value >= GroupStarts.value
      group by t.value
     ) t
group by GroupId;

The subquery GroupStarts is finding the break points, that is, the set of values that differ by 10 or more from the previous value. The next level uses join/aggregation to count the number of such break points before any given value. The outermost query then aggregation using this GroupId.

Upvotes: 1

Alex Deiwor
Alex Deiwor

Reputation: 117

At first, I would export to an array the whole result.

Afterwards, use a function

function show(elements_to_agroup=4)
{
for (i = 0; i < count(array) ; i++)
{
sum = 0;    

if (i % elements_to_agroup)
    { 
    sum = sum / elements_to_agroup; 
    return sum;
    }    
else
    {
    sum =+ array[i];
    }
}
}

Upvotes: 0

cgraus
cgraus

Reputation: 812

I like the other user's suggestion to create a hash column. Joining to yourself has an exponential effect, and should be avoided.

One other possibility is to use /, for example select avg(val), val/10 from myTable group by val/10 would have a value of group that is 0 for 0-9, 1 for 10-19, etc.

At least, it works in SQL Server that way

Upvotes: 0

Kevin Seifert
Kevin Seifert

Reputation: 3582

Create another column with a hash value for the field. This field will be used to test for equality. For example with strings you may store a soundex. For numbers you may store the closest multiple of ten

Otherwise doing a calculation will be much slower. You could also cross join the table to itself and group where the difference of the two fields < 10

Upvotes: 0

Related Questions