ra_learns
ra_learns

Reputation: 113

Conditionally modifying dataframe column value

I have a dataframe that shows the cut scores that relate to different performance levels (1 through 5) on state tests. The DF looks like this:

grade <- rep(1:2, each = 5)
performance_level <- rep(1:5, 2)
score_start <- c(100, 134, 157, 170, 192, 100, 129, 142, 158, 180)
score_end <- c(134, 156, 169, 192, 220, 128, 142, 157, 179, 200)

df <- data.frame(grade, performance_level, score_start, score_end)

The problem is, the score_end in some rows is the same as the score_start in the next row (ex row 1 and 2), so a first grade student who scores a 134 will be duplicated and will show up as earning both plevel 1 and plevel 2. I would like to add 1 to the score start in row 2 so it is 135. Obviously, this problem occurs in multiple rows ( I have a large dataset). I've tried using dplyr lead and lag but I can't quite get it to behave the way I want it to. Here is the code I have tried so far:

try #1

df$score_start[which(df$score_start == lag(df$score_end)] <- df$score_start + 1

try #2

df <- df %>% mutate(score_start = ifelse(score_end == lead(score_start), score_start + 1, score_start))

Any help would be met with much appreciation from me.

Upvotes: 0

Views: 63

Answers (2)

BetterCallMe
BetterCallMe

Reputation: 768

Please see the logic.

for(i in 1:(nrow(df)-1)) {
  if(df$score_end[i] == df$score_start[i+1]) {
    df$score_start[i+1] = df$score_start[i+1]+1
  }
}

Upvotes: 1

sebastiann
sebastiann

Reputation: 163

Maybe you do it like this:

df <- data.table(df)
df[,score_end2:=shift(score_end,1),by=.(grade)]
df[,score_start:=ifelse(is.na(score_end2),score_start,ifelse(score_start==score_end2,score_start+1,score_start))]
df[,score_end2:=NULL]

Upvotes: 1

Related Questions