Sinnombre
Sinnombre

Reputation: 438

Smooth a column

I have a data table that looks like:

library(data.table)
dt = data.table(id=c(1,1,1,2,2,2),date=rep(c("2017-10-22", "2017-10-29", 
                                 "2017-11-05"),2),value=c(0,2,0,1,0,0))

dt:
id  date        value
1   2017-10-22      0
1   2017-10-29      2
1   2017-11-05      0
2   2017-10-22      1
2   2017-10-29      0
2   2017-11-05      0

I want to smooth out the values using the function smooth = 1/2 * value + 1/4 * (previous value) + 1/4 * (last value), with appropriate boundary conditions, ending up with something like:

dt:
id  date        value  smooth
1   2017-10-22      0     0.5
1   2017-10-29      2       1
1   2017-11-05      0     0.5
2   2017-10-22      1    0.75
2   2017-10-29      0    0.25
2   2017-11-05      0       0

I'm unsure of a good way to do this over many ids, even if I nest for loops (which I'd think is avoidable anyway). I think I want something like:

dates = unique(dt$date)
dt[date=dates[1],smooth := 3/4 * value + 1/4 * dt[date=dates[2],value]]

for ( i in 2:(length(dates)-1)) {
  dt[date=dates[i],smooth := 1/2 * value + 1/4 * dt[date=dates[i-1],value]+ 1/4 * dt[date=dates[i+1],value]]
}
dt[date=dates[length(dates)],smooth := 3/4 * value + 1/4 * dt[date=dates[length(dates)-1],value]]

But again, I'm getting messed up by having to keep the ids straight. Any advice?

Upvotes: 1

Views: 338

Answers (2)

Sinnombre
Sinnombre

Reputation: 438

Based on thecatalyst's answer above, assuming the dates are the same for all ids, I can solve this with:

f_date = min(dt$date)
l_date = max(dt$date)
dt[, c("lead1", "lag1") := list(shift(value, n=1, fill=0, type="lead"),   
+                                 shift(value, n=1, fill=0, type="lag"))]  
dt[, smooth := 1/2 * value + 1/4 * lead1 + 1/4 * lag1]
dt[date == f_date,smooth := 3/4*value + 1/4*lead1] 
dt[date == l_date,smooth := 3/4*value + 1/4*lag1]

and to clean up:

dt = dt[,c(-4,-5)]

Upvotes: 1

Tung
Tung

Reputation: 28391

This might not the most efficient way to do it if you have big data but I'd use shift function to create lag and lead columns and use those in the calculation

dt[, c("lead1", "lag1") := list(shift(value, n=1, fill=0, type="lead"), 
                                shift(value, n=1, fill=0, type="lag"))]
dt[1,  smooth := 3/4 * value + 1/4 * lag1, by = .(id)]
dt[.N, smooth := 3/4 * value + 1/4 * lead1, by = .(id)]
dt[, smooth := 1/2 * value + 1/4 * lead1 + 1/4 * lag1, by = .(id)]

> dt
   id       date value lead1 lag1 smooth
1:  1 2017-10-22     0     2    0   0.50
2:  1 2017-10-29     2     0    0   1.00
3:  1 2017-11-05     0     1    2   0.75
4:  2 2017-10-22     1     0    0   0.50
5:  2 2017-10-29     0     0    1   0.25
6:  2 2017-11-05     0     0    0   0.00

Upvotes: 1

Related Questions