Reputation: 111
Suppose I have a reordered matrix A
, which is get by using A = nchoosek(1:100, 6)
in MATLAB. Thus, A
means all the combinations by selecting 6 digits from 1 to 100.
A
is reordered, and the dim of A
is very large (1e9*6), so A
is like this:
1 2 3 4 5 6
1 2 3 4 5 7
1 2 3 4 5 8
. . . . . .
. . . . . .
94 96 97 98 99 100
95 96 97 98 99 100
Then, I have another reordered vector B
, which is a member of A
. Such as B=[5 10 11 40 51 67];
So, how to find the index of B
using the fastest way, which means utilizing the ordering information??
Upvotes: 1
Views: 64
Reputation: 35176
You should note how the rows of A
are generated. I suspect this is not specified in the documentation, so you probably shouldn't rely on it too much. What I mean is that this is technically undocumented behaviour, and it could change with any version.
But note that each element loops over starting from the ending indices, and every combination is sorted. This means that you need to start from the beginning, in
B = [5 10 11 40 51 67];
the first index is 5, which means that all the elements starting with 4
have been enumerated. How many are those?
nchoosek(100-1,6-1)+nchoosek(100-2,6-1)+nchoosek(100-3,6-1)+nchoosek(100-4,6-1)
% [1 . . . . .] [2 . . . . .] [3 . . . . .] [4 . . . . .]
% ^ 2:100 ^ 3:100 ^ 4:100 ^ 5:100
since we have 99 numbers to choose 5 from if the first element is 1, 98 to choose 5 from if the first number is 2, etc.
Next come the elements [5 6 7 8 9 10]
through [5 9 97 98 99 100]
. Again, these are full combinations, if the second element is fixed:
nchoosek(100-6,6-2)+nchoosek(100-7,6-2)+nchoosek(100-8,6-2)+nchoosek(100-9,6-2)
% [5 6 . . . .] [5 7 . . . .] [5 8 . . . .] [5 9 . . . .]
% ^ 7:100 ^ 8:100 ^ 9:100 ^ 10:100
And so on, until you build up all of [5 10 11 40 50 .]
, which contains the last term of nchoosek(100-50,6-5)
. Then all that is left is to count the elements from [5 10 11 40 51 52]
to [5 10 11 40 51 67]
, which is 67-52+1
.
So, formalizing it, if your index vector B
has n elements, each called b_k
for k=1:n
:
sum_{k=1:n} sum_{b=b_{k-1}+1:b_{k}-1} nchoosek(100-b,n-k)
which will work if we define b_0
to be zero for the first index, and note that nchoosek(m,0)==1
.
So here's a straightforward function that should work. It's not optimal in its kind, but it sure beats checking 1e9 vectors against each other:
function ind = find_chooseind(N,B)
% N is the N in 1:N in nchoosek(1:N,K)
% length(B) == K
% define auxiliary B with a zero prepended to avoid out-of-bounds errors
Blen = length(B);
B = [0; B(:)];
ind = 0;
for k=2:Blen+1 % shifted for that first zero
for b=B(k-1)+1:B(k)-1
ind = ind + nchoosek(N-b,Blen-(k-1)); % compensate for k shift
end
end
% testing reveals an off-by-one error, not to worry
ind = ind+1;
I tested the above code with a smaller example:
>> N = 20; K = 4;
>> A20_4 = nchoosek(1:N,K);
>> t = A20_4(randi(nchoosek(N,K)),:);
>> ind = find_chooseind(N,t);
>> A20_4(ind,:)
ans =
7 8 9 14
>> t
t =
7 8 9 14
Upvotes: 3