Reputation: 125
I am a R newbie so hopefully this is a solvable problem for some of you. I have a dataframe containing more than a million data-points. My goal is to compute a weighted mean with an altering starting point.
To illustrate consider this frame ( data.frame(matrix(c(1,2,3,2,2,1),3,2)) )
X1 X2
1 1 2
2 2 2
3 3 1
where X1 is the data and X2 is the sampling weight.
I want to compute the weighted mean for X1 from starting point 1 to 3, from 2:3 and from 3:3.
With a loop I simply wrote:
B <- rep(NA,3) #empty result vector
for(i in 1:3){
B[i] <- weighted.mean(x=A$X1[i:3],w=A$X2[i:3]) #shifting the starting point of the data and weights further to the end
}
With my real data this is impossible to compute because for each iteration the data.frame is altered and the computing takes hours with no result.
Is there a way to implement a varrying starting point with an apply command, so that the perfomance increases?
regards, Ruben
Upvotes: 3
Views: 781
Reputation: 40821
Building upon @joran's answer to produce the correct result:
with(A, rev(cumsum(rev(X1*X2)) / cumsum(rev(X2))))
# [1] 1.800000 2.333333 3.000000
Also note that this is much faster than the sapply
/lapply
approach.
Upvotes: 3
Reputation: 66834
You can use lapply
to create your subsets, and sapply
to loop over these, but I'd wager there would be a quicker way.
sapply(lapply(1:3,":",3),function(x) with(dat[x,],weighted.mean(X1,X2)))
[1] 1.800000 2.333333 3.000000
Upvotes: 1