Reputation: 129
I need to replace strings with numbers across multiple columns. Below is a sample data set:
x <- c("Low Outlier", "High Outlier", "Novice", "Novice", "Emerging", NA, "Proficient", "Approaching")
y <- c("Novice", "Approaching", "Proficient", "Approaching", "High Outlier", "Proficient",NA, "Emerging")
z <- c("High Outlier", "Proficient", "Approaching", "Emerging", "Low Outlier", "Approaching", "Approaching", "Emerging")
sam <- cbind(x,y,z)
I need to convert the "High/Low Outliers" to 0, The NA's to be left as NA, "Novice" to 1, "Emerging" to 2, "Approaching to 3, and "Proficient" to 4.
I have tried to convert a single variable with
sam$x.r <- recode(sam$x.r,'Low Outlier'=0,'High Outlier'=0,'Novice'=1,'Emerging'=2,'Approaching'=3, 'Proficient'=4)
I received an error message of "Warning message:
In recode.numeric(Dat17_18.1$I.E.ScoreStat, Low Outlier
= 0, High Outlier
= 0, :
NAs introduced by coercion"
I am not sure how to recode all of the variables at once.
Upvotes: 3
Views: 2007
Reputation: 20329
Another solution using factors
to recode, and approxfun
to assign values:
sam[] <- approxfun(1:5, c(0:3, 0))(
as.numeric(factor(sam,
c("Low Outlier", "Novice",
"Emerging", "Approaching",
"Proficient", "High Outlier"))))
# x y z
# [1,] "0" "1" NA
# [2,] NA "3" "0"
# [3,] "1" "0" "3"
# [4,] "1" "3" "2"
# [5,] "2" NA "0"
# [6,] NA "0" "3"
# [7,] "0" NA "3"
# [8,] "3" "2" "2"
Upvotes: 0
Reputation: 13319
Got really repetitive real quick. Here's a simple function:
my_replacer<-function(df,y,z){
df<-as.data.frame(apply(df,2,function(x) gsub(y,z,x)))
#y is what you want to replace
#z is the replacement
#This uses regex
df
}
my_replacer(sam,"Emerging.*","2")
Here is how I've used it:
library(dplyr)#can use ifelse. Still repetitive
sam<-as.data.frame(sam)
sam %>%
mutate_if(is.factor,as.character)->sam
my_replacer(sam,"Emerging.*","2")
Result:
x y z
1 Low Outlier Novice High Outlier
2 High Outlier Approaching Proficient
3 Novice Proficient Approaching
4 Novice Approaching 2
5 2 High Outlier Low Outlier
6 <NA> Proficient Approaching
7 Proficient <NA> Approaching
8 Approaching 2 2
Replace others:
my_replacer(sam,"Novi.*","1")
x y z
1 Low Outlier 1 High Outlier
2 High Outlier Approaching Proficient
3 1 Proficient Approaching
4 1 Approaching Emerging
5 Emerging High Outlier Low Outlier
6 <NA> Proficient Approaching
7 Proficient <NA> Approaching
8 Approaching Emerging Emerging
Upvotes: 2
Reputation: 2650
I would use named vectors as a mapping
library(dplyr)
mapping = c("High Outlier" = 0, "Low Outlier" = 0, "Novice" = 1, "Emerging" = 2, "Approaching" = 3, "Proficient" = 4)
sam %>%
as.data.frame() %>%
mutate_all(function(i) mapping[i])
Upvotes: 2
Reputation: 2764
Simply do this-
sam[] <- recode(sam,'Low Outlier'=0,
'High Outlier'=0,
'Novice'=1,
'Emerging'=2,
'Approaching'=3,
'Proficient'=4)
> sam
x y z
[1,] "0" "1" "0"
[2,] "0" "3" "4"
[3,] "1" "4" "3"
[4,] "1" "3" "2"
[5,] "2" "0" "0"
[6,] NA "4" "3"
[7,] "4" NA "3"
[8,] "3" "2" "2"
Upvotes: 4
Reputation: 388982
We can use case_when
from dplyr
for such cases
library(dplyr)
sam %>%
mutate_all(~case_when(. %in% c("Low Outlier", "High Outlier") ~ '0',
. == "Novice" ~ '1',
. == "Emerging" ~ '2',
. == "Approaching" ~ '3',
. == "Proficient" ~ '4',
TRUE ~ NA_character_))
# x y z
#1 0 1 0
#2 0 3 4
#3 1 4 3
#4 1 3 2
#5 2 0 0
#6 <NA> 4 3
#7 4 <NA> 3
#8 3 2 2
However, the final output has character columns since our original columns were characters as well. We can add mutate_all(as.numeric)
to convert them to numeric if needed.
data
x <- c("Low Outlier", "High Outlier", "Novice", "Novice", "Emerging", NA,
"Proficient", "Approaching")
y <- c("Novice", "Approaching", "Proficient", "Approaching", "High Outlier",
"Proficient",NA, "Emerging")
z <- c("High Outlier", "Proficient", "Approaching", "Emerging", "Low Outlier",
"Approaching", "Approaching", "Emerging")
sam <- data.frame(x,y,z, stringsAsFactors = FALSE)
Upvotes: 2