Reputation: 3744
For the first most frequent element, we use mode
. To find the i
th most frequent element, I don't know of any other better way than to keep deleting the first most frequent element, second most frequent element, ..., up to the i-1
th most frequent element from the dataset:
for n=1:size(data,1)
if n~=i
data = data( data(:,1) ~= mode(data(:,1)), :);
else
item = data( data(:,1) == i, :);
end
end
Can there be a better and faster way to do this using vectorization instead of looping?
Upvotes: 1
Views: 135
Reputation: 112659
Here's a way:
x = [2 2 3 1 5 1 1 3 3 3 4 1 2 5 4 3 1 3 3 4]; % data
ii = 2; % find tthe second most frequent value
sx = sort(x); % sort x
ind = [true diff(sx)~=0]; % logical index of each new value in the sorted vector
counts = diff(find([ind true])); % count of each unique value
vals = sx(ind); % unique values
[~, is] = sort(counts, 'descend'); % counts in decreasing order
result = vals(is(ii)); % value with the ii-th largest count
In this example,
>> vals
vals =
1 2 3 4 5
>> counts
counts =
5 3 7 3 2
>> result
result =
1
Upvotes: 2
Reputation: 15837
Suppose we have a dataset contains this values:
data = [5 5 4 2 5 8 8 5 8 4 ];
In order to find most frequent item as you noted mode
is the best method.
but to find ith most frequent item we study histogram
of the data that shows how many each element repeated.
in Matlab the hist
function is for computing the histogram. first argument of hist
is the data and second argument is unique values of elements so in this example they are [2 4 5 8]
unqiue values of elements
unique_val = unique(data);
2 4 5 8
histogram computed
[count val] = hist(data, unique_val);
to plot the histogram you can use hist
this way:
hist(data, unique_val);
so we have such a figure:
_
_ _
_ _ _
_ _ _ _
2 4 5 8
Visually we find that 5 is first most frequent item and 8 is second most frequent item....
But to numerically find the item we can sort the histogram in descending order to get such a figure:
_
_ _
_ _ _
_ _ _ _
5 8 4 2
so 5 is the first 8 is the second ....
In Matlab we concatenate count and val as freq
freq = [count; val].';
then sort freq based on the first column,count. (the minus sign is for descending sort and 1 is for the first column):
out = sortrows(freq , -1)
then out(i,2)
is the ith most frequent item.
in short all of what explained leads to this:
%find count of data
[count val] = hist(data(:,1),unique(data(:,1)));
freq = [count; val].';
%sort counts descendingly
out = sortrows(freq,-1);
now out(i,2)
is ith most frequent element
Upvotes: 2
Reputation: 12214
You can utilize accumarray
and sort
with unique
to generate the bin counts for the unique values in your data array.
For example:
function [val, count] = getnthmost(x, n)
% Get unique values in x (uniquevals) and their location (ic)
[uniquevals, ~, ic] = unique(x);
% Accumulate the indices and sort in descending order
[bincounts_sorted, idx] = sort(accumarray(ic, 1), 'descend');
% Get the nthmost value and its count
val = uniquevals(idx(n));
count = bincounts_sorted(n);
end
And a small example:
x = randi(5, 10, 1);
[val, count] = getnthmost(x, 2);
Which returns:
x =
2 5 1 2 2 1 3 4 3 4
val =
1
count =
2
Note that sort
handles 'ties' in the order they appear in the array being sorted regardless of sort direction, so if we have [1, 2, 3, 2]
our ascending sort indices will be [1, 2, 4, 3]
and the descending sort indices will be [3, 2, 4, 1]
.
We use unique
here to find all of the unique values in our input array, x
. We also store the optional third output, which is a mapping of the values of x
to their index in the array of unique values. We can then use accumarray
to accumulate the elements of 1
using the subscripts we got from unique
. In other words, we're getting a count of each index. We sort this count in descending order and store the output indices so we can map the count back to the value in our array of unique values. We can then use n
to pick and return the appropriate value and count.
Upvotes: 4