Replacing a string with a random number from vector

Question

I have a dataframe which includes grades of students per course. These grades, however, are in A-F format and need to be transformed to numerical grades (10-1). To do so, I generated random numbers that represent these A-F values.

A <- rnorm(nrow(Student_Data), 9.45, 0.2)
B <- rnorm(nrow(Student_Data), 7.95, 0.2)
C <- rnorm(nrow(Student_Data), 6.25, 0.2)
D <- rnorm(nrow(Student_Data), 4.75, 0.2)
F <- rnorm(nrow(Student_Data), 2, 0.2)

I also created a function which allows to replace the letters with numbers

courseGradesNumeric <- data.frame(lapply(courseGrades, function(x) {gsub("A", sample(A, 1), gsub("B", sample(B, 1), gsub("C", sample(C, 1), gsub("D", sample(D, 1), gsub("F", sample(F, 1), x)))))}))

This works quite well but the problem is that if there is an "A" (or any other letter) in a column, then this A in this specific column is replaced by a random number from vector A that is the same across the entire column.

To illustrate:

Current dataframe (ignore the NA's for now)

Student_ID       ABC1000_Grade   ABC1003_Grade 
1    9000006           A              B          
2    9000014           A              A          
3    9000028           B              C          
4    9000045                               
5    9000080           C                       
6    9000091

The problem:

Student_ID       ABC1000_Grade   ABC1003_Grade 
1    9000006        9.335523      8.231295          
2    9000014        9.335523      9.462468          
3    9000028        7.972959      6.394259          
4    9000045                               
5    9000080        6.257297                   
6    9000091

In the ABC1000_Grade column, the A was replaced by the same random number that was generated in an earlier step.

How can I make sure that all replaced values are different random numbers? Thus, the preferred result should be:

Student_ID       ABC1000_Grade   ABC1003_Grade 
1    9000006        9.510445      8.231295          
2    9000014        9.335523      9.462468          
3    9000028        7.972959      6.394259          
4    9000045                               
5    9000080        6.257297                   
6    9000091

Shree · Accepted Answer

In your code, you are generating one random value to replace any given grade and that's why you are getting same values.

Here's a simpler way of getting your desired result using base::switch() with sapply and lastly dplyr package to modify all columns ending with "Grade" in one go -

library(dplyr)

replace_grade <- function(g) {
  sapply(g, function(a) {
    switch(a,
         "A" = rnorm(1, 9.45, 0.2),
         "B" = rnorm(1, 7.95, 0.2),
         "C" = rnorm(1, 6.25, 0.2),
         "D" = rnorm(1, 4.75, 0.2),
         "F" = rnorm(1, 2, 0.2),
         NA_real_
         )
  })
}

# function output for illustration
replace_grade(g = c("A", "B", "C", "D", "F", NA_character_))
       A        B        C        D        F      
9.229176 7.830536 6.239904 4.643644 2.146621       NA 

# apply function to every column ending with "Grade"
df %>% 
  mutate_at(vars(ends_with("Grade")), replace_grade)

  Student_ID ABC1000_Grade ABC1003_Grade
1    9000006      9.243239      7.946469
2    9000014      9.623083      9.072896
3    9000028      8.308868      6.177990
4    9000045            NA            NA
5    9000080      6.336819            NA
6    9000091            NA            NA

Data -

df <- read.table(text = "Student_ID ABC1000_Grade   ABC1003_Grade
9000006 A   B
9000014 A   A
9000028 B   C
9000045     
9000080 C   
9000091     
", header= T, sep = "	", stringsAsFactors = F)

Replacing a string with a random number from vector

Answers (2)

Related Questions