KatyB
KatyB

Reputation: 3980

replace nan values with the mean of other rows

Consider the following example:

Jday = datenum('2007-01-01 00:00','yyyy-mm-dd HH:MM'):1:...
    datenum('2009-12-31 23:00','yyyy-mm-dd HH:MM');
Jday2 = datenum('2008-01-01 00:00','yyyy-mm-dd HH:MM'):1:...
    datenum('2010-12-31 23:00','yyyy-mm-dd HH:MM');

Data{1} = [Jday;1+(length(Jday)-1).*rand(1,length(Jday))]';
Data{2} = [Jday2;1+(length(Jday)-1).*rand(1,length(Jday))]';
Data{3} = [Jday;1+(length(Jday)-1).*rand(1,length(Jday))]';

for i = 1:length(Data);
    d_nan{i} = floor(1+(length(Jday)-1).*rand(1,100));
end

for i = 1:length(Data);
    a(i) = length(d_nan{i});
    for ii = 1:a(i);
        Data{i}(d_nan{i}(ii),2) = nan;
    end
end

In this example I have a cell array 'Data' which contains three cell arrays, the first column of each cell array contains the Julian date of the value in the second column. There are numerous nan values in each cell array. I would like to replace the nan values in each cell array with the average of the values in the other cells for that specific julian date. My problem is complicated due to the three cells not showing data at the same times (although Data{1} and Data{3} have the same days.) If the julian date were identical between the three cells this problem could be solved by using isnan to find the location of the nans and then replace those rows by the nanmean of the other rows (in the other cells).

Could anyone suggest an effective way of tackling this problem? Thank you for your time

Upvotes: 0

Views: 1369

Answers (3)

wilywampa
wilywampa

Reputation: 1350

This may solve your problem, although it's not exactly what you asked for. At each NaN value, it replaces it with the mean of the values interpolated at that day with NaNs removed from the interpolation.

for i = 1:3
    for j = 1:length(Jday)
        if isnan(Data{i}(j,2))
            d = Data{i}(j,1);
            Data{i}(j,2) = mean([ ...
                interp1(Data{1}(find(~isnan(Data{1}(:,2))),1),Data{1}(find(~isnan(Data{1}(:,2))),2),d) ...
                interp1(Data{2}(find(~isnan(Data{2}(:,2))),1),Data{2}(find(~isnan(Data{2}(:,2))),2),d) ...
                interp1(Data{3}(find(~isnan(Data{3}(:,2))),1),Data{3}(find(~isnan(Data{3}(:,2))),2),d)]);
        end
    end
end

Upvotes: 0

wilywampa
wilywampa

Reputation: 1350

Not sure if you saw the edit (I'm a new user, so still learning how this works), so here is a solution that does what you ask and leaves NaNs where the other two don't have the same date.

for i = 1:3
    for j = 1:length(Jday)
        if isnan(Data{i}(j,2))
            d = Data{i}(j,1);
            if (~isempty(find(Data{:,1}==d,1)) && ...
                ~isempty(find(Data{:,2}==d,1)) && ...
                ~isempty(find(Data{:,3}==d,1)))
                Data{i}(j,2) = nanmean([ ...
                    Data{1}(find(Data{:,1}==d,1),2) ...
                    Data{2}(find(Data{:,2}==d,1),2) ...
                    Data{3}(find(Data{:,3}==d,1),2) ]);
            end
        end
    end
end

Upvotes: 0

Eitan T
Eitan T

Reputation: 32930

A possible solution would be:

  1. Concatenating all data and locating the NaN values:

    D = [vertcat(Data{:})];
    nanidx = find(isnan(D(:, 2)));
    
  2. For each index k of such NaN value in the concatenated matrix D, finding other entries with the same date:

    entries = find(D(:, 1) == D(k, 1));
    
  3. Filter the non-NaN values from those entries and replace the original NaN at position k with the mean value:

    values = D(~isnan(D(entries, 2)) & ~ismember(entries, nanidx), 2);
    D(k, 2) = mean(values);
    
  4. Converting everything back into a cell array:

    Data = mat2cell(D, cellfun('length', Data), 2)';
    

Here's the final, copy-paste friendly code:

D = [vertcat(Data{:})];
nanidx = find(isnan(D(:, 2)));
for k = transpose(nanidx(:))
    entries = find(D(:, 1) == D(k, 1));
    values = D(~isnan(D(entries, 2)) & ~ismember(entries, nanidx), 2);
    D(k, 2) = mean(values);
end
Data = mat2cell(D, cellfun('length', Data), 2)';

Upvotes: 1

Related Questions