Reputation: 597
I have a variable (Var.1) that is recorded longitudinally. I want to calculate a cumulative average (or "normal") variable as time progresses. The kicker is that I only want to update the cumulative average when the Var.1 satisfies a condition, in this case it is being >70% of the previous normal. If it satisfies then it should update, if not then the previous value should be carried forward. I was looking for a vectorized solution but am not sure if this is possible.
My sample data is below. I have entered in what the normal should be after excluding the 5th entry (50) given that it does not satisfy the >70% rule.
library(tibble)
Sample.GT = tibble(Var.1 = c(80, 80,90,90,50,80,70, 80,80,80),
Normal = c(80, 80,83.33,85,85,84,81.67,81.43,81.25,81.11))
Dplyr or data.table solutions are preferred. I am looking to implement this with groups across a large data set so a quick solution is ideal.
Upvotes: 0
Views: 75
Reputation: 25225
A possible recursive approach in data.table
:
n <- 1
cs <- GT$Var.1[1L]
GT[1L, cm := cs]
GT[-1L, cm := {
if (Var.1 > 0.7*cs/n) {
cs <- cs + Var.1
n <- n + 1
}
cs / n
}, seq_len(GT[,.N])[-1L]]
or using Rcpp
which will be faster:
library(Rcpp)
calcNorm <- cppFunction('
NumericVector calcNorm(NumericVector v) {
int sz = v.size();
double n = 1.0, cs = v[0];
NumericVector ret(sz);
ret[0] = cs;
for (int i = 1; i < sz; i++) {
if (v[i] > 0.7*cs/n) {
cs = cs + v[i];
n = n + 1.0;
}
ret[i] = cs / n;
}
return(ret);
}
')
GT[, newNormal := calcNorm(Var.1)]
output:
Var.1 Normal cm
1: 80 80.00 80.00000
2: 80 80.00 80.00000
3: 90 83.33 83.33333
4: 90 85.00 85.00000
5: 50 85.00 85.00000
6: 80 84.00 84.00000
7: 70 81.67 81.66667
8: 80 81.43 81.42857
9: 80 81.25 81.25000
10: 80 81.11 81.11111
data:
library(data.table)
GT = data.table(Var.1 = c(80, 80,90,90,50,80,70, 80,80,80),
Normal = c(80, 80,83.33,85,85,84,81.67,81.43,81.25,81.11))
edited thanks to sindri_baldur's comment
Upvotes: 1