user1642513
user1642513

Reputation:

Fastest way to find rows without NaNs in Matlab

I would like to find the indexes of rows without any NaN in the fastest way possible since I need to do it thousands of times. So far I have tried the following two approaches:

find(~isnan(sum(data, 2)));
find(all(~isnan(data), 2));

Is there a clever way to speed this up or is this the best possible? The dimension of the data matrix is usually thousands by hundreds.

Upvotes: 7

Views: 5444

Answers (4)

Serg
Serg

Reputation: 14108

any() is faster than all() or sum(). try:

idx = find(~any(isnan(data), 2));

correction: it seems that sum() approach is faster:

idx = find(~isnan(sum(data, 2)));

Upvotes: 0

bla
bla

Reputation: 26069

Edit: matrix multiplication can be faster than sum, so the operation is almost twice faster for matrices above 500 x500 elements (in my Matlab 2012a machine). So my solution is:

find(~isnan(data*zeros(size(data,2),1)))

Out of the two methods you suggested (denoted f and g) in the question the first is faster (using timeit):

data=rand(4000);
nani=randi(numel(data),1,500);
data(nani)=NaN;
f= @() find(~isnan(sum(data, 2)));
g= @() find(all(~isnan(data), 2));
h= @() find(~isnan(data*zeros(size(data,2),1)));

timeit(f) 
ans =
     0.0263

timeit(g)
ans =
     0.1489

timeit(h)
ans =
     0.0146

Upvotes: 4

Colin T Bowers
Colin T Bowers

Reputation: 18560

If the nan density is high enough, then a double loop will be the fastest method. This is because the search of a row can be discarded as soon as the first nan is found. For example, consider the following speed test:

%# Preallocate some parameters
T = 5000; %# Number of rows
N = 500; %# Number of columns
X = randi(5, T, N); %# Sample data matrix
M = 100; %# Number of simulation iterations
X(X == 1) = nan; %# Randomly set some elements of X to nan

%# Your first method
tic
for m = 1:M
    Soln1 = find(~isnan(sum(X, 2)));
end
toc

%# Your second method
tic
for m = 1:M
    Soln2 = find(all(~isnan(X), 2));
end
toc

%# A double loop
tic
for m = 1:M
    Soln3 = ones(T, 1);
    for t = 1:T
        for n = 1:N
            if isnan(X(t, n))
                Soln3(t) = 0;
                break
            end
        end
    end
    Soln3 = find(Soln3);
end
toc

The results are:

Elapsed time is 0.164880 seconds.
Elapsed time is 0.218950 seconds.
Elapsed time is 0.068168 seconds. %# The double loop method

Of course, the nan density is so high in this simulation that none of the rows are nan free. But you never said anything about the nan density of your matrix, so I figured I'd post this answer for general consumption and contemplation :-)

Upvotes: 2

santiago_apr1
santiago_apr1

Reputation: 555

Can you tell more about what you want to do with the indices

time = cputime;  
    A = rand(1000,100);              % Some matrix data
    for i = 1:100  
        A(randi(20,1,100)) = NaN;    % Randomly assigned NaN  
        B = isnan(A);                % B has 0 and 1  
        C = A(B == 0);               % C has all ~NaN elements
        ind(i,:) = find(B == 1);     % ind has all NaN indices
    end
    disp(cputime-time)

for 100 times in a loop, 0.1404 sec

Upvotes: 0

Related Questions