Reputation: 257
I have two 2D matrices A
and B
, where the rows indicate trials, and columns indicate samples collected during the trial.
I am in a scenario where A
is available, but B
is collected in real-time. I want to calculate the running average of {A
and the available data for B
}, as B
is being sampled. I thought I could accomplish this by calculating a weighted average of A
and B
and update the weights as trials and samples for B
are being collected. Specifically, I thought I could update the weights and recursively use the values I have already saved from the previous iteration. Below is my code and a plot of the output:
close all;
clear all;
%define the sizes of the matrices -- exact numbers aren't important for illustration
n1 = 5;
n2 = 10;
n3 = 12;
%define a matrix that will act as the history of data already collected
A = randi(10,[n2,n1]);
A_avg = mean(A,1); %averaged across n2 trials to get n1 values
%current acts as "incoming" data
B = randi(10,[n3,n1]); %n3 trials, n1 samples per trial
%preallocate matrices for final solutions
correct_means = zeros(n3,n1);
estimated_means = zeros(n3,n1);
for k1=1:size(B,1) %loop through trials
%get running average in the case where we already have all samples
correct_means(k1,:) = mean([A;B([1:k1],:)],1);
for k2=1:size(B,2) %k2 should loop through samples
%calculate averages as samples are incoming recursively (weighted averaging)
if k1>1
estimated_means(k1,k2) = (n2 / (n2+k1)) * A_avg(k2)...
+ ((k1-1)/(n2+k1)) * estimated_means(k1-1,k2) + (1/(n2+k1)) * B(k1,k2);
elseif k1==1
estimated_means(k1,k2) = (n2 / (n2+k1)) * A_avg(k2)...
+ ((k1-1)/(n2+k1)) * estimated_means(k1,k2) + (1/(n2+k1)) * B(k1,k2);
end
% if k1==2, keyboard; end
end
end
%plot the results
figure; hold on;
plot(nan, 'b', 'displayname', 'correct solution');
plot(nan, 'k--', 'displayname', 'my solution');
leg_tmp = legend('show');
set(leg_tmp,'Location','Best');
plot(correct_means, 'b', 'displayname', 'correct solution');
plot(estimated_means, 'k--', 'displayname', 'my solution');
ylabel('running averages');
xlabel('samples');
The plot attached outlines my attempted solution (black) and what I believe to be the correct answer (blue). Note that I am only plotting the average after acquiring all samples for all trials, but I am saving the running average as the data is collected. As you can see - my answers seem slightly off.
My idea was that A
should be updated by the fraction of trials used to determine it's average to the total number of trials as B
is collected. Likewise, the weight for the current sample of B
is simply 1 divided by the total number of current trials in the iteration, and the previous samples of B
are recursively called upon and weighted accordingly. These weights add up to 1 and make sense to me, so I'm having a hard time seeing where I messed up.
Can anyone see where I'm messing up?
Upvotes: 1
Views: 175
Reputation: 24169
You should consider that the longer a code is, the more bugs it tends to accumulate.
I apologize in advance that it was easier for me to rewrite the business logic than to find the bug in your code - so while the result may not be exactly what you were looking for, it does provide a fix. I hope you could still find this useful.
Please look at a slightly simplified version that appear to produce the correct results:
function q60180320
rng(60180320); % For reproducibility
% define the sizes of the matrices
n1 = 5;
n2 = 10;
n3 = 12;
% define a matrix that will act as the history of data already collected
A = randi(10,[n2,n1]);
A_avg = mean(A,1); %averaged across n2 trials to get n1 values
% current acts as "incoming" data
B = randi(10,[n3,n1]); %n3 trials, n1 samples per trial
correct_means = cumsum([A;B],1)./(1:n2+n3).'; correct_means = correct_means(n2+1:end,:);
% preallocate matrices
estimated_means = zeros(n3+1,n1);
estimated_means(1,:) = A_avg; % trick to avoid an if-clause inside the loop
for k1 = 1:size(B,1) % Loop through trials
%% Compute weights:
totalRows = n2 + k1;
W_old = (totalRows - 1)./totalRows;
W_new = 1/totalRows;
%% Incoming measurement (assuming an entire row of B becomes available at a time)
newB = B(k1,:);
%% Compute running average
estimated_means(k1+1,:) = W_old * estimated_means(k1,:) + W_new * newB;
end
estimated_means = estimated_means(2:end,:); % remove the first row;
% plot the results
figure; hold on;
plot(nan, 'k', 'displayname', 'correct solution');
plot(nan, 'w--', 'displayname', 'my solution');
leg_tmp = legend('show');
set(leg_tmp,'Location','EastOutside');
plot(correct_means, 'k', 'displayname', 'correct solution', 'LineWidth', 2);
plot(estimated_means, 'w--', 'displayname', 'my solution');
ylabel('running averages');
xlabel('samples');
Upvotes: 2