Reputation: 73
I have a matrix with a shape like below. I want to delete rows with duplicate values in the first column and leaving row with the smallest number of duplicate values in the second column. my matrix `d =
1 1
2 1
4 1
8 2
2 2
5 4
2 4
6 4
7 3
` I want to remove duplicate number 2 in the first column and leaving the row with the smallest number of duplicate values in the second row result required:
1 1
4 1
8 2
2 2
5 4
6 4
7 3
Thanks for the helps. best regard.
Upvotes: 1
Views: 412
Reputation: 22215
Create a function that finds the minimal duplicate from the right column, given an index from the left column:
function Out = getMinDuplicate (Index, Data)
Candidates = Data(Data(:,1) == Index, :); Candidates = Candidates(:, 2);
Hist = histc (Data(:,2), [1 : max(Data(:,2))]);
[~,Out] = min (Hist(Candidates)); Out = Candidates(Out);
end
Call this function for all unique values in column 1:
>> [unique(d(:,1)), arrayfun(@(x) getMinDuplicate(x, d), unique(d(:,1)))]
ans =
1 1
2 2
4 1
5 4
6 4
7 3
8 2
(where d
is your data array).
Upvotes: 1
Reputation: 15837
We can sort the array regards to first column and replace elements of second column by their descending count to obtain this array:
1 3
2 3
2 3
2 2
4 3
5 3
6 3
7 1
8 2
Then if we apply unique to this array indices of desirable rows can be obtained and then then those rows can be extracted:
1 1
2 2
4 1
5 4
6 4
7 3
8 2
If oreder of original data should be preserved more step required that commented in the code.
a=[...
1 1
2 1
4 1
8 2
2 2
5 4
2 4
6 4
7 3];
%steps to replace counts of each element of column2 with it
[a2_sorted, i_a2_sorted] = sort(a(:,2));
[a2_sorted_unique, i_a2_sorted_unique] = unique(a2_sorted);
h = hist(a2_sorted, a2_sorted_unique);
%count = repelems(h, [1:numel(h); h]);%octave
count = repelem(h, h);
[~,a2_back_idx] = sort(i_a2_sorted);
count = count (a2_back_idx);
b = [a(:,1) , count.'];
%counts shoule be sorted in descending order
%because the unique function extracts last element from each category
[b_sorted i_b_sorted] =sortrows(b,[1 -2]);
[~, i_b1_sorted_unique] = unique(b_sorted(:,1));
c = [b_sorted(:,1) , a(i_b_sorted,2)];
out = c(i_b1_sorted_unique,:)
%more steps to recover the original order
[~,b_back_idx] = sort(i_b_sorted);
idx_logic = false(1,size(a,1));
idx_logic(i_b1_sorted_unique) = true;
idx_logic = idx_logic(b_back_idx);
out = c(b_back_idx(idx_logic),:)
Upvotes: 2