Nathan123
Nathan123

Reputation: 763

how to insert a logical value based on a whether it has a previous value?

I have the following data.frame of students that joined a specific program

library(data.table)

f.name<-c('a','a','b','b','b','c','c')
year<-c(2014,2015,2013,2014,2015,2015,2016)
grade<-c(9,10,8,9,10,7,8)

f.name<-as.character(f.name)

df.have<-data.frame(f.name,year,grade)
df.have

I'm specifically interested in 9th graders that joined a specific program in 2014. However, I want to distinguish between 9th graders who joined this program in 2014 for the first time, and 9th graders who are returning to the program (who were 8th graders in 2013)

I was able to create a column to distinguish 9th graders who joined this program for the first time in 2014 in the following manner

df.have$new.students<-with(df.have, rowid(f.name) == 1 & year == 2014 & grade == 9)
df.have
  f.name year grade new.students
1      a 2014     9         TRUE
2      a 2015    10        FALSE
3      b 2013     8        FALSE
4      b 2014     9        FALSE
5      b 2015    10        FALSE
6      c 2015     7        FALSE
7      c 2016     8        FALSE

How can I create another column to tag returning students. Those who were in 8th grade in 2013 and are returning in 2014? so that it looks like this

  f.name year grade new.student returning.students
1      a 2014     9        TRUE    FALSE
2      a 2015    10       FALSE    FALSE
3      b 2013     8       FALSE    FALSE
4      b 2014     9       FALSE    TRUE
5      b 2015    10       FALSE    FALSE
6      c 2015     7       FALSE    FALSE
7      c 2016     8       FALSE    FALSE

Upvotes: 1

Views: 67

Answers (2)

Frank
Frank

Reputation: 66819

You can use a join to look up the desired rows

library(data.table)
setDT(df.have)

# initialize to FALSE
df.have[, rs := FALSE]

# update to TRUE if the desired row is found
df.have[year == 2014 & grade == 9, rs := 
  df.have[replace(copy(.SD), c("year", "grade"), list(2013, 8)), on=.(f.name, year, grade), .N, by=.EACHI]$N > 0L
]

This could be done with by= and any or cumsum, but I guess it's less efficient:

df.have[, v := 
  year == 2014 & grade == 9 & any(year == 2013 & grade == 8)
, by=f.name]

# or...
df.have[order(year), v := 
  year == 2014 & grade == 9 & cumsum(year == 2013 & grade == 8)
, by=f.name]

Upvotes: 3

tblznbits
tblznbits

Reputation: 6776

If you're willing to use dplyr, you can do this with a group_by and take advantage of the row_number() function.

library(dplyr)
df.have %>% 
  group_by(f.name) %>% 
  mutate(new_student = (grade == 9 & year == 2014 & row_number() == 1), 
         returning_student = (grade == 9 & year == 2014 & row_number() > 1)) %>%
  ungroup()

  f.name  year grade new_student returning_student
  <fct>  <dbl> <dbl> <lgl>       <lgl>            
1 a       2014     9 TRUE        FALSE            
2 a       2015    10 FALSE       FALSE            
3 b       2013     8 FALSE       FALSE            
4 b       2014     9 FALSE       TRUE             
5 b       2015    10 FALSE       FALSE            
6 c       2015     7 FALSE       FALSE            
7 c       2016     8 FALSE       FALSE

Unfortunately, I'm not well versed in data.table, so I can't provide an answer specific to that package.

Upvotes: 0

Related Questions