how to insert a logical value based on a whether it has a previous value?

Question

I have the following data.frame of students that joined a specific program

library(data.table)

f.name<-c('a','a','b','b','b','c','c')
year<-c(2014,2015,2013,2014,2015,2015,2016)
grade<-c(9,10,8,9,10,7,8)

f.name<-as.character(f.name)

df.have<-data.frame(f.name,year,grade)
df.have

I'm specifically interested in 9th graders that joined a specific program in 2014. However, I want to distinguish between 9th graders who joined this program in 2014 for the first time, and 9th graders who are returning to the program (who were 8th graders in 2013)

I was able to create a column to distinguish 9th graders who joined this program for the first time in 2014 in the following manner

df.have$new.students<-with(df.have, rowid(f.name) == 1 & year == 2014 & grade == 9)
df.have
  f.name year grade new.students
1      a 2014     9         TRUE
2      a 2015    10        FALSE
3      b 2013     8        FALSE
4      b 2014     9        FALSE
5      b 2015    10        FALSE
6      c 2015     7        FALSE
7      c 2016     8        FALSE

How can I create another column to tag returning students. Those who were in 8th grade in 2013 and are returning in 2014? so that it looks like this

  f.name year grade new.student returning.students
1      a 2014     9        TRUE    FALSE
2      a 2015    10       FALSE    FALSE
3      b 2013     8       FALSE    FALSE
4      b 2014     9       FALSE    TRUE
5      b 2015    10       FALSE    FALSE
6      c 2015     7       FALSE    FALSE
7      c 2016     8       FALSE    FALSE

Frank · Accepted Answer

You can use a join to look up the desired rows

library(data.table)
setDT(df.have)

# initialize to FALSE
df.have[, rs := FALSE]

# update to TRUE if the desired row is found
df.have[year == 2014 & grade == 9, rs := 
  df.have[replace(copy(.SD), c("year", "grade"), list(2013, 8)), on=.(f.name, year, grade), .N, by=.EACHI]$N > 0L
]

This could be done with by= and any or cumsum, but I guess it's less efficient:

df.have[, v := 
  year == 2014 & grade == 9 & any(year == 2013 & grade == 8)
, by=f.name]

# or...
df.have[order(year), v := 
  year == 2014 & grade == 9 & cumsum(year == 2013 & grade == 8)
, by=f.name]

how to insert a logical value based on a whether it has a previous value?

Answers (2)

Related Questions