Reputation: 23
I have a data frame with several columns, here is an example of one of the columns:
df <- data.frame(x=1:3)
The number 1 stands for "Yes", 2 stands for "No" and 3 stands for "Maybe". One solution that I came up with was to change the class of the variable and then used:
df$x <- replace(df$x, "1", "Yes")
and repeat it for the "No" and "Maybe".
However one of the columns have 27 different values that stands for 27 different words, the code would be too big doing by this way.
Any ideas of how to replace the numbers by the words efficiently?
Upvotes: 2
Views: 193
Reputation: 378
Another way maybe:
df <- sample(1:3, 100, replace = T)
a <- c(1,2,3)
b <- c('Yes', 'No', 'Maybe')
df[df %in% a] <- na.omit(b[match(df, a)]) #useful when there if less levels to replace than original
Or dealing like a factor:
df <- sample(1:3, 100, replace = T)
df <- as.character(factor(df, labels = c('Yes', 'No', 'Maybe')))
Upvotes: 0
Reputation: 1075
You could use mapvalues()
from plyr
:
library(plyr)
x <- c("a", "b", "c")
mapvalues(x, c("a", "c"), c("A", "C"))
[1] "A" "b" "C"
In your case,
df <- data.frame(x=1:3)
mapvalues(df$x, c(1,3,2), c("Yes","Maybe","No"))
[1] "Yes" "No" "Maybe"
Since plyr
is retired, you can do it without calling the package using the following (copied straight from body(mapvalues)
).
my_mapvalues <- function(x, from, to, warn_missing = TRUE) {
if (length(from) != length(to)) {
stop("`from` and `to` vectors are not the same length.")
}
if (!is.atomic(x)) {
stop("`x` must be an atomic vector.")
}
if (is.factor(x)) {
levels(x) <- mapvalues(levels(x), from, to, warn_missing)
return(x)
}
mapidx <- match(x, from)
mapidxNA <- is.na(mapidx)
from_found <- sort(unique(mapidx))
if (warn_missing && length(from_found) != length(from)) {
message("The following `from` values were not present in `x`: ",
paste(from[!(1:length(from) %in% from_found)], collapse = ", "))
}
x[!mapidxNA] <- to[mapidx[!mapidxNA]]
x
}
Upvotes: 2