Reputation: 113
I have a dataframe that shows the cut scores that relate to different performance levels (1 through 5) on state tests. The DF looks like this:
grade <- rep(1:2, each = 5)
performance_level <- rep(1:5, 2)
score_start <- c(100, 134, 157, 170, 192, 100, 129, 142, 158, 180)
score_end <- c(134, 156, 169, 192, 220, 128, 142, 157, 179, 200)
df <- data.frame(grade, performance_level, score_start, score_end)
The problem is, the score_end in some rows is the same as the score_start in the next row (ex row 1 and 2), so a first grade student who scores a 134 will be duplicated and will show up as earning both plevel 1 and plevel 2. I would like to add 1 to the score start in row 2 so it is 135. Obviously, this problem occurs in multiple rows ( I have a large dataset). I've tried using dplyr lead and lag but I can't quite get it to behave the way I want it to. Here is the code I have tried so far:
try #1
df$score_start[which(df$score_start == lag(df$score_end)] <- df$score_start + 1
try #2
df <- df %>% mutate(score_start = ifelse(score_end == lead(score_start), score_start + 1, score_start))
Any help would be met with much appreciation from me.
Upvotes: 0
Views: 63
Reputation: 768
Please see the logic.
for(i in 1:(nrow(df)-1)) {
if(df$score_end[i] == df$score_start[i+1]) {
df$score_start[i+1] = df$score_start[i+1]+1
}
}
Upvotes: 1
Reputation: 163
Maybe you do it like this:
df <- data.table(df)
df[,score_end2:=shift(score_end,1),by=.(grade)]
df[,score_start:=ifelse(is.na(score_end2),score_start,ifelse(score_start==score_end2,score_start+1,score_start))]
df[,score_end2:=NULL]
Upvotes: 1