Reputation: 438
I have a data table that looks like:
library(data.table)
dt = data.table(id=c(1,1,1,2,2,2),date=rep(c("2017-10-22", "2017-10-29",
"2017-11-05"),2),value=c(0,2,0,1,0,0))
dt:
id date value
1 2017-10-22 0
1 2017-10-29 2
1 2017-11-05 0
2 2017-10-22 1
2 2017-10-29 0
2 2017-11-05 0
I want to smooth out the values using the function smooth = 1/2 * value + 1/4 * (previous value) + 1/4 * (last value), with appropriate boundary conditions, ending up with something like:
dt:
id date value smooth
1 2017-10-22 0 0.5
1 2017-10-29 2 1
1 2017-11-05 0 0.5
2 2017-10-22 1 0.75
2 2017-10-29 0 0.25
2 2017-11-05 0 0
I'm unsure of a good way to do this over many ids, even if I nest for loops (which I'd think is avoidable anyway). I think I want something like:
dates = unique(dt$date)
dt[date=dates[1],smooth := 3/4 * value + 1/4 * dt[date=dates[2],value]]
for ( i in 2:(length(dates)-1)) {
dt[date=dates[i],smooth := 1/2 * value + 1/4 * dt[date=dates[i-1],value]+ 1/4 * dt[date=dates[i+1],value]]
}
dt[date=dates[length(dates)],smooth := 3/4 * value + 1/4 * dt[date=dates[length(dates)-1],value]]
But again, I'm getting messed up by having to keep the ids straight. Any advice?
Upvotes: 1
Views: 338
Reputation: 438
Based on thecatalyst's answer above, assuming the dates are the same for all ids, I can solve this with:
f_date = min(dt$date)
l_date = max(dt$date)
dt[, c("lead1", "lag1") := list(shift(value, n=1, fill=0, type="lead"),
+ shift(value, n=1, fill=0, type="lag"))]
dt[, smooth := 1/2 * value + 1/4 * lead1 + 1/4 * lag1]
dt[date == f_date,smooth := 3/4*value + 1/4*lead1]
dt[date == l_date,smooth := 3/4*value + 1/4*lag1]
and to clean up:
dt = dt[,c(-4,-5)]
Upvotes: 1
Reputation: 28391
This might not the most efficient way to do it if you have big data but I'd use shift
function to create lag
and lead
columns and use those in the calculation
dt[, c("lead1", "lag1") := list(shift(value, n=1, fill=0, type="lead"),
shift(value, n=1, fill=0, type="lag"))]
dt[1, smooth := 3/4 * value + 1/4 * lag1, by = .(id)]
dt[.N, smooth := 3/4 * value + 1/4 * lead1, by = .(id)]
dt[, smooth := 1/2 * value + 1/4 * lead1 + 1/4 * lag1, by = .(id)]
> dt
id date value lead1 lag1 smooth
1: 1 2017-10-22 0 2 0 0.50
2: 1 2017-10-29 2 0 0 1.00
3: 1 2017-11-05 0 1 2 0.75
4: 2 2017-10-22 1 0 0 0.50
5: 2 2017-10-29 0 0 1 0.25
6: 2 2017-11-05 0 0 0 0.00
Upvotes: 1