Replacing values in multiple columns of a dataframe based on numeric ranges

Question

I have a dataframe of values for multiple variables, and I want to replace all the numeric values with a character which will label a specific numeric range. I do NOT want equal ranges, so cut() is not an option so far as I understand.

In the following code, if I generate the dataframe and then run any one or two of the replacement commands, they do exactly what I want them to do. But when I run them all together, the final table populates with all "f" values.

#Generate test dataframe

test1<-data.frame(replicate(10,sample(0:1000,100,rep=TRUE)))

#Duplicate dataframe so you can go back and reality check category labels against original data

test<-data.frame(test1)

#These are my replacement commands

  test[test <10] <- "a"
  test[test >=10 & test <25] <- "b"
  test[test >=25 & test <50] <- "c"
  test[test >=50 & test <100] <- "d"
  test[test >=100 & test <500] <- "e"
  test[test >=500] <- "f"

single-run any of the replacement commands and you'll see the variables with those values replaced with the corresponding letter. All I want is this in all values, in all columns, for this dataset. The ultimate purpose is so I can create a frequency table of the variables by the specified ranges.

akrun · Accepted Answer

We can use cut to create the labels based on specifying the breaks. For multiple columns, use lapply from base R to loop over the columns, apply the cut and assign back to the dataset of interest

test[] <- lapply(test, function(x) 
     cut(x, breaks = c(-Inf, 10, 25, 50, 100, 500, Inf), labels = letters[1:6]))

Replacing values in multiple columns of a dataframe based on numeric ranges

Answers (1)

Related Questions