MikeL
MikeL

Reputation: 2459

How to efficiently find and merge duplicate enteries of a vector in MATLAB?

I am writing a MATLAB code to quickly solve the following problem: Let X be a random variable distributed according to P(x), take two independent copies of X, call them X1 and X2 and find the distribution of Y = f(X1,X2) where f(,) is a known function.

To solve the above, I start with two vectors x and p such that p(i) = P(x(i)). Suppose they both contain n elements. I can easily compute the n-by-n matrix y such that y(i,j) = f(x(i), x(j)). Furthermore, I can compute the n-by-n matrix p_out such that p_out(i,j) = p(i) * p(j). This means P(Y = y(i,j)) = p(i,j).

Now, if all elements of y are distinct we are almost done. It remains just converting the matrices to vectors and perhaps sorting them to have a nice output. Suppose we also do this by setting

y = y(:);
p_out = p_out(:);
[y, idx] = sort(y);
p_out = p_out(idx);

The problem is, however, the elements of y are not typically unique. I, hence, have to merge the identical elements of y as follows: if y(i) = y(j) (remember now y is converted to a vector) then remove y(j) and set p(i) = p(i) + p(j). A dirty way of doing this is using a for loop (since y is now sorted we only need to compare each element with its following element). However, I wonder if there exits a nicer way.

I know that unique would remove the duplicated elements of a vector (hence if we only needed y it would be sufficient). I also know that it returns two index vectors that somehow indicate the position of duplicated elements. However, I cannot think of any nice way to use its outputs to appropriately merge the elements of p as well.

Upvotes: 4

Views: 91

Answers (1)

Luis Mendo
Luis Mendo

Reputation: 112669

If I understand correctly, this is a job for accummarray:

y = [1 3 2 4 2 5 6 5 5 1]; %// example data
p = [.1 .5 .3 .2 .4 .1 .1 .2 .1 .3]; %// example data

[y_unique ii jj] = unique(y);
p_summed = accumarray(jj.',p).';

Result:

>> y_unique

y_unique =

     1     2     3     4     5     6

p_summed =

    0.4000    0.7000    0.5000    0.2000    0.4000    0.1000

Upvotes: 3

Related Questions