hgus1294
hgus1294

Reputation: 757

Matlab disregarding NaN's in matrix

I have a matrix (X) of doubles containing time series. Some of the observations are set to NaN when there is a missing value. I want to calculate the standard deviation per column to get a std dev value for each column. Since I have NaNs mixed in, a simple std(X) will not work and if I try std(X(~isnan(X)) I end up getting the std dev for the entire matrix, instead of one per column.

Is there a way to simply omit the NaNs from std dev calculations along the 1st dim without resorting to looping?

Please note that I only want to ignore individual values as opposed to entire rows or cols in case of NaNs. Obviously I cannot set NaNs to zero or any other value as that would impact calculations.

Upvotes: 3

Views: 4172

Answers (1)

Wok
Wok

Reputation: 5303

Have a look at nanstd (stat toolbox).

The idea is to center the data using nanmean, then to replace NaN with zero, and finally to compute the standard deviation.

See nanmean below.

  % maximum admissible fraction of missing values
  max_miss = 0.6;  

  [m,n]   = size(x);

  % replace NaNs with zeros.
  inan    = find(isnan(x));
  x(inan) = zeros(size(inan));

  % determine number of available observations on each variable
  [i,j]   = ind2sub([m,n], inan);     % subscripts of missing entries
  nans    = sparse(i,j,1,m,n);        % indicator matrix for missing values
  nobs    = m - sum(nans);

  % set nobs to NaN when there are too few entries to form robust average
  minobs  = m * (1 - max_miss);
  k       = find(nobs < minobs);
  nobs(k) = NaN;

  mx      = sum(x) ./ nobs;

See nanstd below.

  flag = 1; % default: normalize by nobs-1

  % center data
  xc      = x - repmat(mx, m, 1);

  % replace NaNs with zeros in centered data matrix
  xc(inan) = zeros(size(inan));

  % standard deviation
  sx      = sqrt(sum(conj(xc).*xc) ./ (nobs-flag));

Upvotes: 2

Related Questions