Computing the subset giving the minimum standard deviation in an array

Question

Let us have a vector of size N. For example:

x = rand(N,1)

I want to compute the minimum standard deviation of a subset of length K in the vector.

When N and K are small, it is easy to find the best subset since I can use nchoosek(N,K) to enumerate all possible subsets. But when the values of N and K are larger than let's say N=50 and K=25, nchoosek fails to compute the combinations since the size of the possible subsets are huge.

I wonder if there is a better algorithm to compute the subset that gives the smallest standard deviation in an array efficiently. For example via dynamic programming. Any ideas?

Update:

I've implemented it in a loop aftter the answers and compared to the combinatorial solution. The results are always the same but the speed gain are unprecedented.

n = 20;
k = 10;
x = rand(n,1);
C = nchoosek(x, k);

tic
mins = realmax;
for i = 1:size(C,1)
    s = std(C(i,:));
    if s < mins
        mins = s;
        bestC = C(i,:);
    end
end
toc

tic
[x2, j] = sort(x);
mins2 = realmax;
for i = 1:(n-k+1)
    s = std(x2(i:(i+k-1)));
    if s < mins2
        mins2 = s;
        idx = j((i:(i+k-1)));
    end
end
toc

if mins == mins2
    'Equal'
end

gives

Elapsed time is 7.786579 seconds.
Elapsed time is 0.002068 seconds.

ans =

Equal

YXD · Accepted Answer

Sort the array and then compute this in one pass with rolling window of length K.

I'm sure this will give you the right answer, will think if I can prove it.

Hand-wavey argument, with a possible gap in logic in the "extend this" part:

Consider an element x from your list. Let's try and find the minimum standard deviation of a set of size 2 containing this element. We'll get this by choosing x and the closest element to x. Extend this to k elements and we'll get a set that is a contiguous part of the sorted list that contains x. To pick the smallest subset of k elements (i.e for any x) we therefore just have to iterate over the sorted list as described before.

Computing the subset giving the minimum standard deviation in an array

Answers (1)

Related Questions