Reputation: 23
If I have an array that contains the values [6712, 7023, 7510, 7509, 6718, 7514, 7509, 6247]
and I want 4 groups of similar numbers so that the output is 4 matrices:
[6247]
[6712, 6718]
[7023]
[7510, 7509, 7514, 7509]
What would be the best way to accomplish this?
Upvotes: 1
Views: 9363
Reputation: 7530
Actually, for your specific case, there's really no need for any kind of complicated (and quite incomprehensible) clustering procedure, nor any (seemingly simple looking) explicit sorting based solution.
Assuming now that your values, close to each other (more or less, like abs(x- x_0)<= 50
) defines the groups (of interest), then why not just proceed with a very simple and straightforward manner.
Thus, by utilizing the 'most natural' proximity of your values to each other; you could simply proceed as follows:
>>> x= [6712 7023 7510 7509 6718 7514 7509 6247]; g= round(x/ 50)
g =
134 140 150 150 134 150 150 125
>>> groups= {}; for g_u= unique(g), groups{end+ 1}= x(g_u== g); end
>>> groups
groups =
{
[1,1] = 6247
[1,2] = 6712 6718
[1,3] = 7023
[1,4] = 7510 7509 7514 7509
}
Upvotes: 1
Reputation: 124563
I believe the term you are looking for is clustering. For example, we can apply the Kmeans algorithm to group the data into 4 clusters:
X = [6712, 7023, 7510, 7509, 6718, 7514, 7509, 6247];
[IDX,C] = kmeans(X, 4, 'EmptyAction','singleton');
G = cell(4,1);
for i=1:4
G{i} = X(IDX==i);
end
This is one of the result I get:
>> G{:}
ans =
7510 7509 7514 7509
ans =
7023
ans =
6247
ans =
6712 6718
Usually this works best with more points (also works for multidimensional data)
Upvotes: 7
Reputation: 125864
You have to first decide what the criteria is for determining the boundaries of your groups. For example, you could set a threshold value of 50, so any values that differ from their nearest larger or smaller value are considered to be in a different group.
You can solve this in a vectorized way by first sorting the array using the function SORT, then finding the indices into the sorted array where the differences between neighboring values are greater than your threshold (i.e. where the group boundaries are) using the functions DIFF and FIND. Taking the differences between these indices (again using the function DIFF) gives you a vector of sizes for each group, which can be used to break the sorted array into a cell array using the function MAT2CELL. Here's what the code would look like:
threshold = 50;
array = [6712 7023 7510 7509 6718 7514 7509 6247];
sortedArray = sort(array);
nPerGroup = diff(find([1 (diff(sortedArray) > threshold) 1]));
groupArray = mat2cell(sortedArray,1,nPerGroup);
And groupArray
will be a 1-by-4 cell array where each cell contains a set of values for a group. Here are the contents of groupArray
for the above example:
>> groupArray{:}
ans =
6247
ans =
6712 6718
ans =
7023
ans =
7509 7509 7510 7514
Upvotes: 0
Reputation: 28302
What do you mean by "similar"? For instance, why is 6718 not similar to 7023? Do we mean "difference < N between consecutive ints in a group"?
If so, sort the array and then step through it, identifying boundaries where you need them (i.e., when the difference is too great). Then simply split off a new array.
Such as...
GroupSimilar(values)
1. result := list()
2. values' := sort(values)
3. temp := list()
4. for i := 1 to |values'| - 1 do
5. if values'[i+1] - values'[i] <= diff then
6. temp.add(values'[i])
7. else
8. result.add(temp)
9. temp := list()
10. return result
Upvotes: 0