Reputation: 83215
Suppose I have the following dataset:
> dat
value id
1: 2.785300 1
2: 4.164371 1
3: 4.179532 1
4: 4.373546 1
5: 4.378759 1
6: 4.694612 1
7: 4.955066 2
8: 4.983810 2
9: 5.183643 2
10: 5.329508 2
11: 5.389843 2
12: 5.487429 2
13: 5.575781 3
14: 5.738325 3
15: 5.943836 3
16: 6.124931 3
17: 6.511781 3
18: 6.595281 3
which can be created with:
set.seed(1)
library(data.table)
dat <- data.table(value=round(rnorm(18, mean = 5, sd = 1),6))[order(value)][,id:=rep(1:3, each=6)]
Next thing I did is creating a variable which indicates in which row a new id
occurs first and give that row a specific changeid
with all other rows having NA values with:
dat[, changeid:=ifelse(+(!(id==shift(id,n=1L,type="lag")))==1,1,NA)
][, changeid:=rleid(changeid)[changeid==1]]
which gives:
> dat
value id changeid
1: 2.785300 1 NA
2: 4.164371 1 NA
3: 4.179532 1 NA
4: 4.373546 1 NA
5: 4.378759 1 NA
6: 4.694612 1 NA
7: 4.955066 2 2
8: 4.983810 2 NA
9: 5.183643 2 NA
10: 5.329508 2 NA
11: 5.389843 2 NA
12: 5.487429 2 NA
13: 5.575781 3 4
14: 5.738325 3 NA
15: 5.943836 3 NA
16: 6.124931 3 NA
17: 6.511781 3 NA
18: 6.595281 3 NA
Now I want to creat two new variables:
window
which has the same changeid
value for all rows within a certain boundary of value
(e.g.: within 0.2
of the row which has a changeid
value). The other rows have NA
's.iddif
with the difference with the value
for row which has a changeid
. The other rows have NA
's.The desired result:
> dat
value id changeid window iddif
1: 2.785300 1 NA NA NA
2: 4.164371 1 NA NA NA
3: 4.179532 1 NA NA NA
4: 4.373546 1 NA NA NA
5: 4.378759 1 NA NA NA
6: 4.694612 1 NA NA NA
7: 4.955066 2 2 2 0.000000
8: 4.983810 2 NA 2 0.028744
9: 5.183643 2 NA NA NA
10: 5.329508 2 NA NA NA
11: 5.389843 2 NA 4 -0.185938
12: 5.487429 2 NA 4 -0.088352
13: 5.575781 3 4 4 0.000000
14: 5.738325 3 NA 4 0.162544
15: 5.943836 3 NA NA NA
16: 6.124931 3 NA NA NA
17: 6.511781 3 NA NA NA
18: 6.595281 3 NA NA NA
Any ideas how to get to this desired result?
Bonus question: How do get the changeid
to start at 1
and then increment with 1
for the next occurance of an id-change?
It would be nice if the solution(s) make use of data.table
as well.
Upvotes: 3
Views: 167
Reputation: 92282
Here's a possible solution using foverlaps
First, I would create changeid
as follows (as per the bonus)
dat[c(0L, diff(id)) == 1L, changeid := 1:.N]
Then, I would create a temp data set where changeid
isn't NA
, key it, create interval columns in dat
and run foverlaps
over them. Then, extract matched rows and update the original data
temp <- dat[!is.na(changeid), .(start = value, end = value)] # temp data
dat[, `:=`(start = value - 0.2, end = value + 0.2)] # set boundries
setkey(temp) # key the smaller data
res <- foverlaps(dat, temp, which = TRUE, nomatch = 0L) # get matched incidents
dat[res$xid, `:=`(window = res$yid, iddif = temp$start[res$yid])] # update values
dat[!is.na(window), iddif := value - iddif] # calculate differences
dat
# value id changeid start end window iddif
# 1: 2.785300 1 NA 2.585300 2.985300 NA NA
# 2: 4.164371 1 NA 3.964371 4.364371 NA NA
# 3: 4.179532 1 NA 3.979532 4.379532 NA NA
# 4: 4.373546 1 NA 4.173546 4.573546 NA NA
# 5: 4.378759 1 NA 4.178759 4.578759 NA NA
# 6: 4.694612 1 NA 4.494612 4.894612 NA NA
# 7: 4.955066 2 1 4.755066 5.155066 1 0.000000
# 8: 4.983810 2 NA 4.783810 5.183810 1 0.028744
# 9: 5.183643 2 NA 4.983643 5.383643 NA NA
# 10: 5.329508 2 NA 5.129508 5.529508 NA NA
# 11: 5.389843 2 NA 5.189843 5.589843 2 -0.185938
# 12: 5.487429 2 NA 5.287429 5.687429 2 -0.088352
# 13: 5.575781 3 2 5.375781 5.775781 2 0.000000
# 14: 5.738325 3 NA 5.538325 5.938325 2 0.162544
# 15: 5.943836 3 NA 5.743836 6.143836 NA NA
# 16: 6.124931 3 NA 5.924931 6.324931 NA NA
# 17: 6.511781 3 NA 6.311781 6.711781 NA NA
# 18: 6.595281 3 NA 6.395281 6.795281 NA NA
(You can remove start
and end
if don't like them)
Upvotes: 4