Greg
Greg

Reputation: 3670

Increment column based on second column

Consider a dataframe like this:

dat <- data.frame(name = c("John", "John", "John", "John", "Sam", "Sam", "Sam", "Sam", "Emily", "Emily", "Emily", "Emily", "Alex", "Alex", "Alex", "Alex"),
                  grade = c(9, NA, NA, 12, 10, NA, NA, NA, 9, 10, 10, 11, NA, 11, 11, NA),
                  year = c(2007, 2008, 2009, 2010, rep(c(2018, 2019, 2019, 2020), 3)))

    name grade year
1   John     9 2007
2   John    NA 2008
3   John    NA 2009
4   John    12 2010
5    Sam    10 2018
6    Sam    NA 2019
7    Sam    NA 2019
8    Sam    NA 2020
9  Emily     9 2018
10 Emily    10 2019
11 Emily    10 2019
12 Emily    11 2020
13  Alex    NA 2018
14  Alex    11 2019
15  Alex    11 2019
16  Alex    NA 2020

I'd like to fill in the missing NAs in grade based on year. So Sam for example is in 10th grade in 2018, and in 2019 he'll be in 11th grade:

dat_desired <- data.frame(name = c("John", "John", "John", "John", "Sam", "Sam", "Sam", "Sam", "Emily", "Emily", "Emily", "Emily", "Alex", "Alex", "Alex", "Alex"),
                  grade = c(9, 10, 11, 12, 10, 11, 11, 12, 9, 10, 10, 11, 10, 11, 11, 12),
                  year = c(2007, 2008, 2009, 2010, rep(c(2018, 2019, 2019, 2020), 3)))

    name grade year
1   John     9 2007
2   John    10 2008
3   John    11 2009
4   John    12 2010
5    Sam    10 2018
6    Sam    11 2019
7    Sam    11 2019
8    Sam    12 2020
9  Emily     9 2018
10 Emily    10 2019
11 Emily    10 2019
12 Emily    11 2020
13  Alex    10 2018
14  Alex    11 2019
15  Alex    11 2019
16  Alex    12 2020

How can I do this? I've tried:

dat %>% 
  group_by(name) %>% 
  fill(grade, .direction = "updown")

which doesn't work because it doesn't take year into account. Adding a group_by(year) doesn't help either, because then no NAs are replaced.

Upvotes: 4

Views: 57

Answers (2)

A rough solution can be implemented with a for loop:

X <- as.character(unique(dat$name))

for ( i in 1:length(X)){
a <- min(dat[dat$name==X[i],]$grade, na.rm = T)
b <- min(dat[dat$name==X[i],]$year, na.rm = T)
dat[dat$name==X[i] & is.na(dat$grade),]$grade <- dat[dat$name==X[i] & is.na(dat$grade),]$year-b+a
}

Upvotes: 1

arg0naut91
arg0naut91

Reputation: 14774

Could try:

dat %>%
  group_by(name) %>%
  mutate(
    grade = coalesce(grade, first(grade[!is.na(grade)]) - (first(year[!is.na(grade)]) - year))
  )

Output:

# A tibble: 16 x 3
# Groups:   name [4]
   name  grade  year
   <fct> <dbl> <dbl>
 1 John      9  2007
 2 John     10  2008
 3 John     11  2009
 4 John     12  2010
 5 Sam      10  2018
 6 Sam      11  2019
 7 Sam      11  2019
 8 Sam      12  2020
 9 Emily     9  2007
10 Emily    10  2008
11 Emily    10  2009
12 Emily    11  2010
13 Alex     10  2018
14 Alex     11  2019
15 Alex     11  2019
16 Alex     12  2020

Upvotes: 3

Related Questions