Kristada673
Kristada673

Reputation: 3744

How to find the "i"th most frequent element in a vector without using loops in MATLAB?

For the first most frequent element, we use mode. To find the ith most frequent element, I don't know of any other better way than to keep deleting the first most frequent element, second most frequent element, ..., up to the i-1th most frequent element from the dataset:

for n=1:size(data,1)
   if n~=i
      data = data( data(:,1) ~= mode(data(:,1)), :);
   else
      item = data( data(:,1) == i, :);
   end
end

Can there be a better and faster way to do this using vectorization instead of looping?

Upvotes: 1

Views: 135

Answers (3)

Luis Mendo
Luis Mendo

Reputation: 112659

Here's a way:

x = [2 2 3 1 5 1 1 3 3 3 4 1 2 5 4 3 1 3 3 4]; % data
ii = 2; % find tthe second most frequent value
sx = sort(x); % sort x
ind = [true diff(sx)~=0]; % logical index of each new value in the sorted vector
counts = diff(find([ind true])); % count of each unique value
vals = sx(ind); % unique values
[~, is] = sort(counts, 'descend'); % counts in decreasing order
result = vals(is(ii)); % value with the ii-th largest count

In this example,

>> vals
vals =
     1     2     3     4     5
>> counts
counts =
     5     3     7     3     2
>> result
result =
     1

Upvotes: 2

rahnema1
rahnema1

Reputation: 15837

Suppose we have a dataset contains this values:

data = [5 5 4 2 5 8 8 5 8 4 ];

In order to find most frequent item as you noted mode is the best method. but to find ith most frequent item we study histogram of the data that shows how many each element repeated. in Matlab the hist function is for computing the histogram. first argument of hist is the data and second argument is unique values of elements so in this example they are [2 4 5 8]

unqiue values of elements

unique_val = unique(data);
2 4 5 8

histogram computed

[count val] = hist(data, unique_val);

to plot the histogram you can use hist this way:

hist(data, unique_val);

so we have such a figure:

      _
      _  _
   _  _  _
_  _  _  _ 
2  4  5  8

Visually we find that 5 is first most frequent item and 8 is second most frequent item....

But to numerically find the item we can sort the histogram in descending order to get such a figure:

_
_  _
_  _  _  
_  _  _  _
5  8  4  2

so 5 is the first 8 is the second ....

In Matlab we concatenate count and val as freq

freq =  [count; val].';

then sort freq based on the first column,count. (the minus sign is for descending sort and 1 is for the first column):

out = sortrows(freq , -1)

then out(i,2) is the ith most frequent item. in short all of what explained leads to this:

%find count of data
[count val] = hist(data(:,1),unique(data(:,1)));
freq = [count; val].';
%sort counts descendingly
out = sortrows(freq,-1);

now out(i,2) is ith most frequent element

Upvotes: 2

sco1
sco1

Reputation: 12214

You can utilize accumarray and sort with unique to generate the bin counts for the unique values in your data array.

For example:

function [val, count] = getnthmost(x, n)
% Get unique values in x (uniquevals) and their location (ic)
[uniquevals, ~, ic] = unique(x);

% Accumulate the indices and sort in descending order
[bincounts_sorted, idx] = sort(accumarray(ic, 1), 'descend');

% Get the nthmost value and its count
val = uniquevals(idx(n));
count = bincounts_sorted(n);
end

And a small example:

x = randi(5, 10, 1);
[val, count] = getnthmost(x, 2);

Which returns:

x =

     2     5     1     2     2     1     3     4     3     4


val =

     1


count =

     2

Note that sort handles 'ties' in the order they appear in the array being sorted regardless of sort direction, so if we have [1, 2, 3, 2] our ascending sort indices will be [1, 2, 4, 3] and the descending sort indices will be [3, 2, 4, 1].

Walkthrough

We use unique here to find all of the unique values in our input array, x. We also store the optional third output, which is a mapping of the values of x to their index in the array of unique values. We can then use accumarray to accumulate the elements of 1 using the subscripts we got from unique. In other words, we're getting a count of each index. We sort this count in descending order and store the output indices so we can map the count back to the value in our array of unique values. We can then use n to pick and return the appropriate value and count.

Upvotes: 4

Related Questions