Reputation: 3670
Consider a dataframe like this:
dat <- data.frame(name = c("John", "John", "John", "John", "Sam", "Sam", "Sam", "Sam", "Emily", "Emily", "Emily", "Emily", "Alex", "Alex", "Alex", "Alex"),
grade = c(9, NA, NA, 12, 10, NA, NA, NA, 9, 10, 10, 11, NA, 11, 11, NA),
year = c(2007, 2008, 2009, 2010, rep(c(2018, 2019, 2019, 2020), 3)))
name grade year
1 John 9 2007
2 John NA 2008
3 John NA 2009
4 John 12 2010
5 Sam 10 2018
6 Sam NA 2019
7 Sam NA 2019
8 Sam NA 2020
9 Emily 9 2018
10 Emily 10 2019
11 Emily 10 2019
12 Emily 11 2020
13 Alex NA 2018
14 Alex 11 2019
15 Alex 11 2019
16 Alex NA 2020
I'd like to fill in the missing NAs in grade
based on year
. So Sam
for example is in 10th grade in 2018, and in 2019 he'll be in 11th grade:
dat_desired <- data.frame(name = c("John", "John", "John", "John", "Sam", "Sam", "Sam", "Sam", "Emily", "Emily", "Emily", "Emily", "Alex", "Alex", "Alex", "Alex"),
grade = c(9, 10, 11, 12, 10, 11, 11, 12, 9, 10, 10, 11, 10, 11, 11, 12),
year = c(2007, 2008, 2009, 2010, rep(c(2018, 2019, 2019, 2020), 3)))
name grade year
1 John 9 2007
2 John 10 2008
3 John 11 2009
4 John 12 2010
5 Sam 10 2018
6 Sam 11 2019
7 Sam 11 2019
8 Sam 12 2020
9 Emily 9 2018
10 Emily 10 2019
11 Emily 10 2019
12 Emily 11 2020
13 Alex 10 2018
14 Alex 11 2019
15 Alex 11 2019
16 Alex 12 2020
How can I do this? I've tried:
dat %>%
group_by(name) %>%
fill(grade, .direction = "updown")
which doesn't work because it doesn't take year
into account. Adding a group_by(year)
doesn't help either, because then no NA
s are replaced.
Upvotes: 4
Views: 57
Reputation: 752
A rough solution can be implemented with a for loop:
X <- as.character(unique(dat$name))
for ( i in 1:length(X)){
a <- min(dat[dat$name==X[i],]$grade, na.rm = T)
b <- min(dat[dat$name==X[i],]$year, na.rm = T)
dat[dat$name==X[i] & is.na(dat$grade),]$grade <- dat[dat$name==X[i] & is.na(dat$grade),]$year-b+a
}
Upvotes: 1
Reputation: 14774
Could try:
dat %>%
group_by(name) %>%
mutate(
grade = coalesce(grade, first(grade[!is.na(grade)]) - (first(year[!is.na(grade)]) - year))
)
Output:
# A tibble: 16 x 3
# Groups: name [4]
name grade year
<fct> <dbl> <dbl>
1 John 9 2007
2 John 10 2008
3 John 11 2009
4 John 12 2010
5 Sam 10 2018
6 Sam 11 2019
7 Sam 11 2019
8 Sam 12 2020
9 Emily 9 2007
10 Emily 10 2008
11 Emily 10 2009
12 Emily 11 2010
13 Alex 10 2018
14 Alex 11 2019
15 Alex 11 2019
16 Alex 12 2020
Upvotes: 3