Reputation: 23

Grouping similar values in Matlab

If I have an array that contains the values [6712, 7023, 7510, 7509, 6718, 7514, 7509, 6247] and I want 4 groups of similar numbers so that the output is 4 matrices:

[6247]
[6712, 6718]
[7023]
[7510, 7509, 7514, 7509]

What would be the best way to accomplish this?

Upvotes: 1

Answers (4)

eat

Reputation: 7530

Actually, for your specific case, there's really no need for any kind of complicated (and quite incomprehensible) clustering procedure, nor any (seemingly simple looking) explicit sorting based solution.

Assuming now that your values, close to each other (more or less, like abs(x- x_0)<= 50) defines the groups (of interest), then why not just proceed with a very simple and straightforward manner.

Thus, by utilizing the 'most natural' proximity of your values to each other; you could simply proceed as follows:

>>> x= [6712 7023 7510 7509 6718 7514 7509 6247]; g= round(x/ 50)
g =
   134 140 150 150 134 150 150 125

>>> groups= {}; for g_u= unique(g), groups{end+ 1}= x(g_u== g); end
>>> groups
groups =
{
  [1,1] =  6247
  [1,2] =  6712 6718
  [1,3] =  7023
  [1,4] =  7510 7509 7514 7509
}

Upvotes: 1

Amro

Reputation: 124563

I believe the term you are looking for is clustering. For example, we can apply the Kmeans algorithm to group the data into 4 clusters:

X = [6712, 7023, 7510, 7509, 6718, 7514, 7509, 6247];
[IDX,C] = kmeans(X, 4, 'EmptyAction','singleton');
G = cell(4,1);
for i=1:4
    G{i} = X(IDX==i);
end

This is one of the result I get:

>> G{:}
ans =
        7510        7509        7514        7509
ans =
        7023
ans =
        6247
ans =
        6712        6718

Usually this works best with more points (also works for multidimensional data)

Upvotes: 7

gnovice

Reputation: 125864

You have to first decide what the criteria is for determining the boundaries of your groups. For example, you could set a threshold value of 50, so any values that differ from their nearest larger or smaller value are considered to be in a different group.

You can solve this in a vectorized way by first sorting the array using the function SORT, then finding the indices into the sorted array where the differences between neighboring values are greater than your threshold (i.e. where the group boundaries are) using the functions DIFF and FIND. Taking the differences between these indices (again using the function DIFF) gives you a vector of sizes for each group, which can be used to break the sorted array into a cell array using the function MAT2CELL. Here's what the code would look like:

threshold = 50;
array = [6712 7023 7510 7509 6718 7514 7509 6247];
sortedArray = sort(array);
nPerGroup = diff(find([1 (diff(sortedArray) > threshold) 1]));
groupArray = mat2cell(sortedArray,1,nPerGroup);

And groupArray will be a 1-by-4 cell array where each cell contains a set of values for a group. Here are the contents of groupArray for the above example:

>> groupArray{:}

ans =

        6247

ans =

        6712        6718

ans =

        7023

ans =

        7509        7509        7510        7514

Upvotes: 0

Patrick87

Reputation: 28302

What do you mean by "similar"? For instance, why is 6718 not similar to 7023? Do we mean "difference < N between consecutive ints in a group"?

If so, sort the array and then step through it, identifying boundaries where you need them (i.e., when the difference is too great). Then simply split off a new array.

Such as...

  GroupSimilar(values)
   1. result := list()
   2. values' := sort(values)
   3. temp := list()
   4. for i := 1 to |values'| - 1 do
   5.    if values'[i+1] - values'[i] <= diff then
   6.       temp.add(values'[i])
   7.    else
   8.        result.add(temp)
   9.        temp := list()
  10. return result

Upvotes: 0

Grouping similar values in Matlab

Answers (4)

Related Questions