Reputation: 2299
I have the following memory-speed problem in Matlab and I would like your help to understand whether there may be a solution.
Consider the following 4
big column vectors X1, X2, Y1, Y2
.
clear
rng default
P=10^8;
X1=rand(1,P)*5;
X2=rand(1,P)*5;
Y1=rand(1,P)*5;
Y2=rand(1,P)*5;
What I would like to do is a scatter plot where on the x-axis I have the sum between any possible two elements of X1
and X2
and on the y-axis I have the sum between any possible two elements of Y1
and Y2
.
I post here three options I thought about that do not work mainly because of memory and speed issues.
Option 1 (issues: too slow when doing the loop, out of memory when doing vertcat
)
Xtemp=cell(P,1);
Ytemp=cell(P,1);
for i=1:P
tic
Xtemp{i}=X1(i)+X2(:);
Ytemp{i}=Y1(i)+Y2(:);
toc
end
X=vertcat(Xtemp{:});
Y=vertcat(Ytemp{:});
scatter(X,Y)
Option 2 (issues: too slow when doing the loop, time increasing as the loop proceeds, Matlab going crazy and unable to produce the scatter even if I stop the loop after 5 iterations)
for i=1:P
tic
scatter(X1(i)+X2(:), Y1(i)+Y2(:))
hold on
toc
end
Option 3 (sort of giving up) (issues: as I increase T
the scatter gets closer and closer to a square which is correct; I am wondering though whether this is caused by the fact that I generated the data using rand
and in option 3 I use randi
; maybe with my real data the scatter does not "converge" to the true plot as I increase T
; also, what is the "optimal" T
and R
?).
T=20;
R=500;
for t=1:T
tic
%select R points at random from X1,X2,Y1,Y2
X1sel=(X1(randi(R,R,1)));
X2sel=(X2(randi(R,R,1)));
Y1sel=(Y1(randi(R,R,1)));
Y2sel=(Y2(randi(R,R,1)));
%do option 1 among those points and plot
Xtempsel=cell(R,1);
Ytempsel=cell(R,1);
for r=1:R
Xtempsel{r}=X1sel(r)+X2sel(:);
Ytempsel{r}=Y1sel(r)+Y2sel(:);
end
Xsel=vertcat(Xtempsel{:});
Ysel=vertcat(Ytempsel{:});
scatter(Xsel,Ysel, 'b', 'filled')
hold on
toc
end
Is there a way to do what I want or is simply impossible?
Upvotes: 1
Views: 62
Reputation: 1580
You are trying to build a vector with P^2 elements, i.e. 10^16. This is many order of magnitude more that what would fit into the memory of a standard computer (10GB is 10^10 bytes or 1.2 billion double precision floats).
For smaller vectors (i.e. P<1e4), try:
Xsum=bsxfun(@plus,X1,X2.'); %Matrix with the sum of any two elements from X1 and X2
X=X(:); %Reshape to vector
Ysum=bsxfun(@plus,Y1,Y2.');
Y=Y(:);
plot(X,Y,'.') %Plot as small dots, likely to take forever if there are too many points
To build a figure with a more reasonable number of pairs picked randomly from these large vectors:
Npick=1e4;
sel1=randi(P,[Npick,1]);
sel2=randi(P,[Npick,1]);
Xsel=X1(sel1)+X2(sel2);
Ysel=Y1(sel1)+Y2(sel2);
plot(Xsel,Ysel,'.'); %Plot as small dots
Upvotes: 2