Reputation: 2299
Suppose I have three matrices A_1
, A_2
, A_3
each of dimension mxn
, where m
and n
are large. These matrices contain strictly positive numbers.
I want to construct a matrix check
of dimension mx1
such that, for each i=1,...,m
:
check(i)=1
if there exists j,k
such that
A_1(i,1)+A_2(j,1)+A_3(k,1)<=quantile(A_1(i,2:end)+A_2(j,2:end)+A_3(k,3:end), 0.95)
In my case m
is large (m=10^5
) and n=500
. Therefore, I would like your help to find an efficient way to do this.
Below I reproduce an example. I impose m
smaller than in reality and report my incomplete and probably inefficient attempt to construct check
.
clear
rng default
m=4;
n=500;
A_1=betarnd(1,2,m,n);
A_2=betarnd(1,2,m,n);
A_3=betarnd(1,2,m,n);
check=zeros(m,1);
for i=1:m
for j=1:m
for k=1:m
if A_1(i,1)+A_2(j,1)+A_3(k,1)<=quantile(A_1(i,2:end)+A_2(j,2:end)+A_3(k,2:end), 0.95)
check(i)=1;
STOP TO LOOP OVER j AND k, MOVE TO THE NEXT i (INCOMPLETE!)
else
KEEP SEARCHING FOR j,k SUCH THAT THE CONDITION IS SATISFIED (INCOMPLETE!)
end
end
end
end
Upvotes: 4
Views: 165
Reputation: 15837
Given a scalar x
and a vector v
the expression x <=quantile (v, .95)
can be written as sum( x > v) < Q
where Q = .95 * numel(v)
*.
Also A_1
can be splitted before the loop to avoid extra indexing.
Moreover the most inner loop can be removed in favor of vectorization.
Af_1 = A_1(:,1);
Af_2 = A_2(:,1);
Af_3 = A_3(:,1);
As_1 = A_1(:,2:end);
As_2 = A_2(:,2:end);
As_3 = A_3(:,2:end);
Q = .95 * (n -1);
for i=1:m
for j=1:m
if any (sum (Af_1(i) + Af_2(j) + Af_3 > As_1(i,:) + As_2(j,:) + As_3, 2) < Q)
check(i) = 1;
break;
end
end
end
More optimization can be achieved by rearranging the expressions involved in the inequality and pre-computation:
lhs = A_3(:,1) - A_3(:,2:end);
lhsi = A_1(:,1) - A_1(:,2:end);
rhsj = A_2(:,2:end) - A_2(:,1);
Q = .95 * (n - 1);
for i=1:m
LHS = lhs + lhsi(i,:);
for j=1:m
if any (sum (LHS > rhsj(j,:), 2) < Q)
check(i) = 1;
break;
end
end
end
Upvotes: 4
Reputation: 1390
Option 1:
Because all numbers are positive, you can do some optimizations. 95 percentile will be only higher if you add A1
to the mix - if you find the j
and k
of greatest 95 percentile of A2+A3
on the right side compared to the sum of the first 2 elements, you can simply take that for every i
.
maxDif = -inf;
for j = 1 : m
for k = 1 : m
newDif = quantile(A_2..., 0.95) - A_2(j,1)-A_3(k,1);
maxDif = max(newDif, maxDif);
end
end
If even that is too slow, you can first get maxDifA2
and maxDifA3
, then estimate that maxDif
will be for those particular j
and k
values and calculate it.
Now, for some numbers you will get that maxDif > A_1
, then the check
is 1. For some numbers you will get that maxDif + quantile(A1, 0.95) < A_1
, here check
is 0 (if you estimated maxDif
by separate calculation of A2 and A3 this isn't true!). For some (most?) you will unfortunately get values in between and this won't be helpful at all. Then what remains is option 2 (it is also more straightforward):
Option 2:
You could save some time if you can save summation A_2+A_3
on the right side, as that calculation repeats for every different i
, but that requires A LOT of memory. But quantile
is the more expensive operation anyway, so you aren't saving a lot of time. Something along the lines of
for j = 1 : m
for k = 1 : m
A23R(j,k,:) = A2(j,:)+A3(k,:); % Unlikely to fit in memory.
end
end
Then you can perform your loops, using A23R and avoiding to repeat that sum for every i
.
Upvotes: 0