Reputation: 1979
I would like to recode a numerical variable based on a cut score criterion. If the cut scores are not available in the variable, I would like to recode the closest smaller value as a cut score. Here is a snapshot of dataset:
ids <- c(1,2,3,4,5,6,7,8,9,10)
scores <- c(512,531,541,555,562,565,570,572,573,588)
data <- data.frame(ids, scores)
> data
ids scores
1 1 512
2 2 531
3 3 541
4 4 555
5 5 562
6 6 565
7 7 570
8 8 572
9 9 573
10 10 588
cuts <- c(531, 560, 575)
The first cut score (531
) is in the dataset. So it will stay the same as 531
. However, 560
and 575
were not available. I would like to recode the closest smaller value (555
) to the second cut score as 560
in the new column, and for the third cut score, I'd like to recode 573
as 575
.
Here is what I would like to get.
ids scores rescored
1 1 512 512
2 2 531 531
3 3 541 541
4 4 555 560
5 5 562 562
6 6 565 565
7 7 570 570
8 8 572 572
9 9 573 575
10 10 588 588
Any thoughts? Thanks
Upvotes: 1
Views: 86
Reputation: 887048
One option would be to find the index with findInterval
and then get the pmax
of the 'scores' corresponding to that index with the 'cuts' and updated the 'rescored' column elements on that index
i1 <- with(data, findInterval(cuts, scores))
data$rescored <- data$scores
data$rescored[i1] <- with(data, pmax(scores[i1], cuts))
data
# ids scores rescored
#1 1 512 512
#2 2 531 531
#3 3 541 541
#4 4 555 560
#5 5 562 562
#6 6 565 565
#7 7 570 570
#8 8 572 572
#9 9 573 575
#10 10 588 588
Upvotes: 2