Reputation: 3384
data=data.frame("student"=c(1,1,1,1,2,2,2,2,3,3,3,3,4),
"timeHAVE"=c(1,4,7,10,2,5,NA,11,6,NA,NA,NA,3),
"timeWANT"=c(1,4,7,10,2,5,8,11,6,9,12,15,3))
library(dplyr);library(tidyverse)
data$timeWANTattempt=data$timeHAVE
data <- data %>%
group_by(student) %>%
fill(timeWANTattempt)+3
I have 'timeHAVE' and I want to replace missing times with the previous time +3. I show my dplyr attempt but it does not work. I seek a data.table solution. Thank you.
Upvotes: 1
Views: 72
Reputation: 25225
Another data.table
option without grouping:
setDT(data)[, w := fifelse(is.na(timeHAVE) & student==shift(student),
nafill(timeHAVE, "locf") + 3L * rowid(rleid(timeHAVE)),
timeHAVE)]
output:
student timeHAVE timeWANT w
1: 1 1 1 1
2: 1 4 4 4
3: 1 7 7 7
4: 1 10 10 10
5: 2 2 2 2
6: 2 5 5 5
7: 2 NA 8 8
8: 2 11 11 11
9: 3 6 6 6
10: 3 NA 9 9
11: 3 NA 12 12
12: 3 NA 15 15
13: 4 NA NA NA
14: 4 3 3 3
data with student=4 having NA for the first timeHAVE:
data = data.frame("student"=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4),
"timeHAVE"=c(1,4,7,10,2,5,NA,11,6,NA,NA,NA,NA,3),
"timeWANT"=c(1,4,7,10,2,5,8,11,6,9,12,15,NA,3))
Upvotes: 0
Reputation: 34703
Here's an approach using 'locf'
filling
setDT(data)
data[ , by = student, timeWANT := {
# carry previous observations forward whenever missing
locf_fill = nafill(timeHAVE, 'locf')
# every next NA, the amount shifted goes up by another 3
na_shift = cumsum(idx <- is.na(timeHAVE))
# add the shift, but only where the original data was missing
locf_fill[idx] = locf_fill[idx] + 3*na_shift[idx]
# return the full vector
locf_fill
}]
Warning that this won't work if a given student
can have more than one non-consecutive set of NA
values in timeHAVE
Upvotes: 1
Reputation: 17648
you can try.
data %>%
group_by(student) %>%
mutate(n_na = cumsum(is.na(timeHAVE))) %>%
mutate(timeHAVE = ifelse(is.na(timeHAVE), timeHAVE[n_na == 0 & lead(n_na) == 1] + 3*n_na, timeHAVE))
student timeHAVE timeWANT n_na
<dbl> <dbl> <dbl> <int>
1 1 1 1 0
2 1 4 4 0
3 1 7 7 0
4 1 10 10 0
5 2 2 2 0
6 2 5 5 0
7 2 8 8 1
8 2 11 11 1
9 3 6 6 0
10 3 9 9 1
11 3 12 12 2
12 3 15 15 3
13 4 3 3 0
I included the little helper n_na
which counts NA
's in a row. Then the second mutate muliplies the number of NAs with three and adds this to the first non-NA
element before NA
's
Upvotes: 3