Reputation: 763
I have the following data.frame
of students that joined a specific program
library(data.table)
f.name<-c('a','a','b','b','b','c','c')
year<-c(2014,2015,2013,2014,2015,2015,2016)
grade<-c(9,10,8,9,10,7,8)
f.name<-as.character(f.name)
df.have<-data.frame(f.name,year,grade)
df.have
I'm specifically interested in 9th graders that joined a specific program in 2014. However, I want to distinguish between 9th graders who joined this program in 2014 for the first time, and 9th graders who are returning to the program (who were 8th graders in 2013)
I was able to create a column to distinguish 9th graders who joined this program for the first time in 2014 in the following manner
df.have$new.students<-with(df.have, rowid(f.name) == 1 & year == 2014 & grade == 9)
df.have
f.name year grade new.students
1 a 2014 9 TRUE
2 a 2015 10 FALSE
3 b 2013 8 FALSE
4 b 2014 9 FALSE
5 b 2015 10 FALSE
6 c 2015 7 FALSE
7 c 2016 8 FALSE
How can I create another column to tag returning students. Those who were in 8th grade in 2013 and are returning in 2014? so that it looks like this
f.name year grade new.student returning.students
1 a 2014 9 TRUE FALSE
2 a 2015 10 FALSE FALSE
3 b 2013 8 FALSE FALSE
4 b 2014 9 FALSE TRUE
5 b 2015 10 FALSE FALSE
6 c 2015 7 FALSE FALSE
7 c 2016 8 FALSE FALSE
Upvotes: 1
Views: 67
Reputation: 66819
You can use a join to look up the desired rows
library(data.table)
setDT(df.have)
# initialize to FALSE
df.have[, rs := FALSE]
# update to TRUE if the desired row is found
df.have[year == 2014 & grade == 9, rs :=
df.have[replace(copy(.SD), c("year", "grade"), list(2013, 8)), on=.(f.name, year, grade), .N, by=.EACHI]$N > 0L
]
This could be done with by=
and any
or cumsum
, but I guess it's less efficient:
df.have[, v :=
year == 2014 & grade == 9 & any(year == 2013 & grade == 8)
, by=f.name]
# or...
df.have[order(year), v :=
year == 2014 & grade == 9 & cumsum(year == 2013 & grade == 8)
, by=f.name]
Upvotes: 3
Reputation: 6776
If you're willing to use dplyr
, you can do this with a group_by
and take advantage of the row_number()
function.
library(dplyr)
df.have %>%
group_by(f.name) %>%
mutate(new_student = (grade == 9 & year == 2014 & row_number() == 1),
returning_student = (grade == 9 & year == 2014 & row_number() > 1)) %>%
ungroup()
f.name year grade new_student returning_student
<fct> <dbl> <dbl> <lgl> <lgl>
1 a 2014 9 TRUE FALSE
2 a 2015 10 FALSE FALSE
3 b 2013 8 FALSE FALSE
4 b 2014 9 FALSE TRUE
5 b 2015 10 FALSE FALSE
6 c 2015 7 FALSE FALSE
7 c 2016 8 FALSE FALSE
Unfortunately, I'm not well versed in data.table
, so I can't provide an answer specific to that package.
Upvotes: 0