Reputation: 763
I have a data frame in the following format that represent a large data set that I have
F.names<-c('M','M','M','A','A')
L.names<-c('Ab','Ab','Ab','Ac','Ac')
year<-c('August 2015','September 2014','September 2016', 'August 2014','September 2013')
grade<-c(NA,'9th Grade','11th Grade',NA,'11th grade')
df.have<-data.frame(F.names,L.names,year,grade)
F.names L.names year grade
1 M Ab August 2015 <NA>
2 M Ab September 2014 9th Grade
3 M Ab September 2016 11th Grade
4 A Ac August 2014 <NA>
5 A Ac September 2013 11th grade
The year
column is in factor
format in the original data set and there are several missing values for grade
.Basically I want to fill in the missing grade
values based on year
column so that it looks like the following.
F.names L.names year grade
1 M Ab August 2015 10th Grade
2 M Ab September 2014 9th Grade
3 M Ab September 2016 11th Grade
4 A Ac August 2014 12th Grade
5 A Ac September 2013 11th grade
I was thinking that my first step would be to covert the year
column which is in factor
format to a date format. and then arrange the columns in order and use something like fill
from tidyr
to fill the missing columns. How should I go about doing this, or is there a better way to approach this?
Upvotes: 1
Views: 79
Reputation: 16121
F.names<-c('M','M','M','A','A')
L.names<-c('Ab','Ab','Ab','Ac','Ac')
year<-c('August 2015','September 2014','September 2016', 'August 2014','September 2013')
grade<-c(NA,'9th Grade','11th Grade',NA,'11th grade')
df.have<-data.frame(F.names,L.names,year,grade)
library(tidyverse)
df.have %>%
separate(year, c("m","y"), convert = T, remove = F) %>%
separate(grade, c("num","type"), sep="th", convert = T) %>%
arrange(F.names, y) %>%
group_by(F.names) %>%
mutate(num = ifelse(is.na(num), lag(num) + 1, num),
type = "grade") %>%
ungroup() %>%
unite(grade, num, type, sep="th ") %>%
select(-m, -y)
# F.names L.names year grade
# 1 A Ac September 2013 11th grade
# 2 A Ac August 2014 12th grade
# 3 M Ab September 2014 9th grade
# 4 M Ab August 2015 10th grade
# 5 M Ab September 2016 11th grade
This solution assumes that you won't have 2 or more consecutive NA
s for a given F.names
value.
Upvotes: 2