Reputation: 618
I'd like to classify some data into factor levels. So I wrote a function that will take an input and return the corresponding level from a factor. The problem is that the result I get is the integer value of the factor, not the factor. Here is a sample code.
data <- data.frame(a = 1:10)
find_class <- function(i) {
classes <- factor(c('A', 'B', 'C'))
ifelse(i %in% c(1, 3, 5), classes[1],
ifelse(i %in% c(2, 4, 9), classes[2], classes[3]))
}
data$class <- find_class(data$a)
Thus data$class
is of type int
. How to get data$class to be a factor?
Also, since the breaks are not based on a simple value range, I can't use cut (which would work fine).
Upvotes: 2
Views: 1131
Reputation: 42564
The latest release of the fct_collapse()
function from the forecats
package can be used in place of OP's own find_class()
function. Please, make sure to install the development version 0.4.0.9000 from GitHub instead of CRAN version 0.4.0 by
devtools::install_github("tidyverse/forcats")
Then,
data$class <- forcats::fct_collapse(as.factor(data$a),
A = c("1", "3", "5"), B = c("2", "4", "9"),
other_level = "C")
data
returns
a class 1 1 A 2 2 B 3 3 A 4 4 B 5 5 A 6 6 C 7 7 C 8 8 C 9 9 B 10 10 C
str(data)
'data.frame': 10 obs. of 2 variables: $ a : int 1 2 3 4 5 6 7 8 9 10 $ class: Factor w/ 3 levels "A","B","C": 1 2 1 2 1 3 3 3 2 3
Another approach is to create a lookup table from a named list:
find_class <- function(i, classes) {
long <- reshape2::melt(classes)
as.factor(long$L1[match(data$a, long$value, nomatch = which(is.na(long$value)))])
}
data$class <- find_class(data$a, list(A = c(1, 3, 5), B = c(2, 4, 9), C = NA))
data
a class 1 1 A 2 2 B 3 3 A 4 4 B 5 5 A 6 6 C 7 7 C 8 8 C 9 9 B 10 10 C
str(data)
'data.frame': 10 obs. of 2 variables: $ a : int 1 2 3 4 5 6 7 8 9 10 $ class: Factor w/ 3 levels "A","B","C": 1 2 1 2 1 3 3 3 2 3
The advantage is that the classification is not hard-coded but can be passed in a compact way as an additional parameter. Thus, the number of classes can be modified easily without having to deal with nested ifelse()
.
data$class <- find_class(data$a)
data
a class 1 1 A 2 2 B 3 3 A 4 4 B 5 5 A 6 6 C 7 7 C 8 8 C 9 9 B 10 10 C
str(data)
'data.frame': 10 obs. of 2 variables: $ a : int 1 2 3 4 5 6 7 8 9 10 $ class: Factor w/ 3 levels "A","B","C": 1 2 1 2 1 3 3 3 2 3
Upvotes: 0
Reputation: 1
I may figure it out. Take a close look at the source code of "ifelse" by running it without brackets. Your will see a segment of code as below:
ans <- test
len <- length(ans)
ypos <- which(test)
npos <- which(!test)
if (length(ypos) > 0L)
ans[ypos] <- rep(yes, length.out = len)[ypos]
if (length(npos) > 0L)
ans[npos] <- rep(no, length.out = len)[npos]
ans
That is, "ifelse" want the logical vector "ans" to take the value of "rep(yes, length.out = len)[ypos]". However, when the value from "rep()"is a factor, the factor value will/must be coerced to integer, so ifelse did not give what u want.
Possible solution:
find_class <- function(i) {
classes <- c("A", "B", "C")
i=1:10
outcome=ifelse(i %in% c(1, 3, 5), classes[1],
ifelse(i %in% c(2, 4, 9), classes[2], classes[3]))
as.factor(outcome)
}
find_class(data)
this works because a logical vector can take character value and covert itself into a character vector, while the one in your function get coerced to an integer one.
Upvotes: 0
Reputation: 572
One more option - using a general mapping function as parameter:
factorize = function(
data,
mapping=function(v)
ifelse(v %in% c(1, 3, 5), "A",
ifelse(v %in% c(2, 4, 9), "B", "C"))
) {
as.factor(mapping(data))
}
That gives:
> factorize(1:10)
[1] A B A B A C C C B C
Levels: A B C
And now an option with a mapping vector instead of a mapping function:
factorize = function(
data,
mapping=c("1"="A", "2"="B", "3"="A", "4"="B", "5"="A", "9"="B"),
default="C"
) {
data = mapping[as.character(data)]
data[is.na(data)] = default
names(data) = NULL
as.factor(data)
}
Upvotes: 0
Reputation: 6499
You can use the levels of the variable Classes
and the output of the ifelse statement as follows:
data <- data.frame(a = 1:10)
find_class <- function(i) {
classes <- factor(c('A', 'B', 'C'))
idx <- ifelse(i %in% c(1, 3, 5), classes[1],
ifelse(i %in% c(2, 4, 9), classes[2], classes[3]))
res <- levels(classes)[idx]
factor(res, levels(classes))
}
data$class <- find_class(data$a)
data$class
# [1] A B A B A C C C B C
# Levels: A B C
data
# a class
# 1 1 A
# 2 2 B
# 3 3 A
# 4 4 B
# 5 5 A
# 6 6 C
# 7 7 C
# 8 8 C
# 9 9 B
# 10 10 C
Upvotes: 1
Reputation: 6567
It's the return of ifelse
that is causing the problem. If I use case_when
from dplyr
it works.
library(dplyr)
data <- data.frame(a = 1:10)
find_class <- function(i) {
classes <- factor(c('A', 'B', 'C'))
case_when(
i %in% c(1,3,5) ~ classes[1],
i %in% c(2,4,9) ~ classes[2],
TRUE ~ classes[3]
)
}
data$class <- find_class(data$a)
str(data)
# 'data.frame': 10 obs. of 2 variables:
# $ a : int 1 2 3 4 5 6 7 8 9 10
# $ class: Factor w/ 3 levels "A","B","C": 1 2 1 2 1 3 3 3 2 3
Upvotes: 2