Reputation: 586
I am trying to create random lines and select some of them, which are really rare. My code is rather simple, but to get something that I can use I need to create very large vectors(i.e.: <100000000 x 1, tracks variable in my code). Is there any way to be able to creater larger vectors and to reduce the time needed for all those calculations?
My code is
%Initial line values
tracks=input('Give me the number of muon tracks: ');
width=1e-4;
height=2e-4;
Ystart=15.*ones(tracks,1);
Xstart=-40+80.*rand(tracks,1);
%Xend=-40+80.*rand(tracks,1);
Xend=laprnd(tracks,1,Xstart,15);
X=[Xstart';Xend'];
Y=[Ystart';zeros(1,tracks)];
b=(Ystart.*Xend)./(Xend-Xstart);
hot=0;
cold=0;
for i=1:tracks
if ((Xend(i,1)<width/2 && Xend(i,1)>-width/2)||(b(i,1)<height && b(i,1)>0))
plot(X(:, i),Y(:, i),'r');%the chosen ones!
hold all
hot=hot+1;
else
%plot(X(:, i),Y(:, i),'b');%the rest of them
%hold all
cold=cold+1;
end
end
I am also using and calling a Laplace distribution generator made my Elvis Chen which can be found here
function y = laprnd(m, n, mu, sigma)
%LAPRND generate i.i.d. laplacian random number drawn from laplacian distribution
% with mean mu and standard deviation sigma.
% mu : mean
% sigma : standard deviation
% [m, n] : the dimension of y.
% Default mu = 0, sigma = 1.
% For more information, refer to
% http://en.wikipedia.org./wiki/Laplace_distribution
% Author : Elvis Chen ([email protected])
% Date : 01/19/07
%Check inputs
if nargin < 2
error('At least two inputs are required');
end
if nargin == 2
mu = 0; sigma = 1;
end
if nargin == 3
sigma = 1;
end
% Generate Laplacian noise
u = rand(m, n)-0.5;
b = sigma / sqrt(2);
y = mu - b * sign(u).* log(1- 2* abs(u));
The result plot is
Upvotes: 2
Views: 306
Reputation: 38032
As you indicate, your problem is two-fold. On the one hand, you have memory issues because you need to do so many trials. On the other hand, you have performance issues, because you have to process all those trials.
Solutions to each issue often have a negative impact on the other issue. IMHO, the best approach would be to find a compromise.
More trials are only possible of you get rid of those gargantuan arrays that are required for vectorization, and use a different strategy to do the loop. I will give priority to the possibility of using more trials, possibly at the cost of optimal performance.
When I execute your code as-is in the Matlab profiler, it immediately shows that the initial memory allocation for all your variables takes a lot of time. It also shows that the plot
and hold all
commands are the most time-consuming lines of them all. Some more trial-and-error shows that there is a disappointingly low maximum value for the trials
you can do before OUT OF MEMORY
errors start appearing.
The loop can be accelerated tremendously if you know a few things about its limitations in Matlab. In older versions of Matlab, it used to be true that loops should be avoided completely in favor of 'vectorized' code. In recent versions (I believe R2008a and up), the Mathworks introduced a piece of technology called the JIT accelerator (Just-in-Time compiler) which translates M-code into machine language on the fly during execution. Simply put, the JIT accelerator allows your code to bypass Matlab's interpreter and talk much more directly with the underlying hardware, which can save a lot of time.
The advice you'll hear a lot that loops should be avoided in Matlab, is no longer generally true. While vectorization still has its value, any procedure of sizable complexity that is implemented using only vectorized code is often illegible, hard to understand, hard to change and hard to upkeep. An implementation of the same procedure that uses loops, often has none of these drawbacks, and moreover, it will quite often be faster and require less memory.
Unfortunately, the JIT accelerator has a few nasty (and IMHO, unnecessary) limitations that you'll have to learn about.
One such thing is plot
; it's generally a better idea to let a loop do nothing other than collect and manipulate data, and delay any plotting commands etc. until after the loop.
Another such thing is hold
; the hold
function is not a Matlab built-in function, meaning, it is implemented in M-language. Matlab's JIT accelerator is not able to accelerate non-builtin functions when used in a loop, meaning, your entire loop will run at Matlab's interpretation speed, rather than machine-language speed! Therefore, also delay this command until after the loop :)
Now, in case you're wondering, this last step can make a HUGE difference -- I know of one case where copy-pasting a function body into the upper-level loop caused a 1200x performance improvement. Days of execution time had been reduced to minutes!).
There is actually another minor issue in your loop (which is really small, and rather inconvenient, I will immediately agree with) -- the name of the loop variable should not be i
. The name i
is the name of the imaginary unit in Matlab, and the name resolution will also unnecessarily consume time on each iteration. It's small, but non-negligible.
Now, considering all this, I've come to the following implementation:
function [hot, cold, h] = MuonTracks(tracks)
% NOTE: no variables larger than 1x1 are initialized
width = 1e-4;
height = 2e-4;
% constant used for Laplacian noise distribution
bL = 15 / sqrt(2);
% Loop through all tracks
X = [];
hot = 0;
ii = 0;
while ii <= tracks
ii = ii + 1;
% Note that I've inlined (== copy-pasted) the original laprnd()
% function call. This was necessary to work around limitations
% in loops in Matlab, and prevent the nececessity of those HUGE
% variables.
%
% Of course, you can still easily generalize all of this:
% the new data
u = rand-0.5;
Ystart = 15;
Xstart = 800*rand-400;
Xend = Xstart - bL*sign(u)*log(1-2*abs(u));
b = (Ystart*Xend)/(Xend-Xstart);
% the test
if ((b < height && b > 0)) ||...
(Xend < width/2 && Xend > -width/2)
hot = hot+1;
% growing an array is perfectly fine when the chances of it
% happening are so slim
X = [X [Xstart; Xend]]; %#ok
end
end
% This is trivial to do here, and prevents an 'else' in the loop
cold = tracks - hot;
% Now plot the chosen ones
h = figure;
hold all
Y = repmat([15;0], 1, size(X,2));
plot(X, Y, 'r');
end
With this implementation, I can do this:
>> tic, MuonTracks(1e8); toc
Elapsed time is 24.738725 seconds.
with a completely negligible memory footprint.
The profiler now also shows a nice and even distribution of effort along the code; no lines that really stand out because of their memory use or performance.
It's possibly not the fastest possible implementation (if anyone sees obvious improvements, please, feel free to edit them in). But, if you're willing to wait, you'll be able to do MuonTracks(1e23)
(or higher :)
I've also done an implementation in C, which can be compiled into a Matlab MEX file:
/* DoMuonCounting.c */
#include <math.h>
#include <matrix.h>
#include <mex.h>
#include <time.h>
#include <stdlib.h>
void CountMuons(
unsigned long long tracks,
unsigned long long *hot, unsigned long long *cold, double *Xout);
/* simple little helper functions */
double sign(double x) { return (x>0)-(x<0); }
double rand_double() { return (double)rand()/(double)RAND_MAX; }
/* the gateway function */
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
int
dims[] = {1,1};
const mxArray
/* Output arguments */
*hot_out = plhs[0] = mxCreateNumericArray(2,dims, mxUINT64_CLASS,0),
*cold_out = plhs[1] = mxCreateNumericArray(2,dims, mxUINT64_CLASS,0),
*X_out = plhs[2] = mxCreateDoubleMatrix(2,10000, mxREAL);
const unsigned long long
tracks = (const unsigned long long)mxGetPr(prhs[0])[0];
unsigned long long
*hot = (unsigned long long*)mxGetPr(hot_out),
*cold = (unsigned long long*)mxGetPr(cold_out);
double
*Xout = mxGetPr(X_out);
/* call the actual function, and return */
CountMuons(tracks, hot,cold, Xout);
}
// The actual muon counting
void CountMuons(
unsigned long long tracks,
unsigned long long *hot, unsigned long long *cold, double *Xout)
{
const double
width = 1.0e-4,
height = 2.0e-4,
bL = 15.0/sqrt(2.0),
Ystart = 15.0;
double
Xstart,
Xend,
u,
b;
unsigned long long
i = 0ul;
*hot = 0ul;
*cold = tracks;
/* seed the RNG */
srand((unsigned)time(NULL));
/* aaaand start! */
while (i++ < tracks)
{
u = rand_double() - 0.5;
Xstart = 800.0*rand_double() - 400.0;
Xend = Xstart - bL*sign(u)*log(1.0-2.0*fabs(u));
b = (Ystart*Xend)/(Xend-Xstart);
if ((b < height && b > 0.0) || (Xend < width/2.0 && Xend > -width/2.0))
{
Xout[0 + *hot*2] = Xstart;
Xout[1 + *hot*2] = Xend;
++(*hot);
--(*cold);
}
}
}
compile in Matlab with
mex DoMuonCounting.c
(after having run mex setup
:) and then use it in conjunction with a small M-wrapper like this:
function [hot,cold, h] = MuonTrack2(tracks)
% call the MEX function
[hot,cold, Xtmp] = DoMuonCounting(tracks);
% process outputs, and generate plots
hot = uint32(hot); % circumvents limitations in 32-bit matlab
X = Xtmp(:,1:hot);
clear Xtmp
h = NaN;
if ~isempty(X)
h = figure;
hold all
Y = repmat([15;0], 1, hot);
plot(X, Y, 'r');
end
end
which allows me to do
>> tic, MuonTrack2(1e8); toc
Elapsed time is 14.496355 seconds.
Note that the memory footprint of the MEX version is slightly larger, but I think that's nothing to worry about.
The only flaw I see is the fixed maximum number of Muon counts (hard-coded as 10000 as the initial array size of Xout
; needed because there are no dynamically growing arrays in standard C)...if you're worried this limit could be broken, simply increase it, change it to be equal to a fraction of tracks
, or do some smarter (but more painful) dynamic array-growing tricks.
Upvotes: 3
Reputation: 7213
In Matlab, it is sometimes faster to vectorize rather than use a for
loop. For example, this expression:
(Xend(i,1) < width/2 && Xend(i,1) > -width/2) || (b(i,1) < height && b(i,1) > 0)
which is defined for each value of i
, can be rewritten in a vectorised manner like this:
isChosen = (Xend(:,1) < width/2 & Xend(:,1) > -width/2) | (b(:,1) < height & b(:,1)>0)
Expessions like Xend(:,1)
will give you a column vector, so Xend(:,1) < width/2
will give you a column vector of boolean values. Note then that I have used &
rather than &&
- this is because &
performs an element-wise logical AND, unlike &&
which only works on scalar values. In this way you can build the entire expression, such that the variable isChosen
holds a column vector of boolean values, one for each row of your Xend
/b
vectors.
Getting counts is now as simple as this:
hot = sum(isChosen);
since true
is represented by 1
. And:
cold = sum(~isChosen);
Finally, you can get the data points by using the boolean vector to select rows:
plot(X(:, isChosen),Y(:, isChosen),'r'); % Plot chosen values
hold all;
plot(X(:, ~isChosen),Y(:, ~isChosen),'b'); % Plot unchosen values
EDIT: The code should look like this:
isChosen = (Xend(:,1) < width/2 & Xend(:,1) > -width/2) | (b(:,1) < height & b(:,1)>0);
hot = sum(isChosen);
cold = sum(~isChosen);
plot(X(:, isChosen),Y(:, isChosen),'r'); % Plot chosen values
Upvotes: 2