Daniel
Daniel

Reputation: 618

Return factor level from a function, not an integer in R

I'd like to classify some data into factor levels. So I wrote a function that will take an input and return the corresponding level from a factor. The problem is that the result I get is the integer value of the factor, not the factor. Here is a sample code.

data <- data.frame(a = 1:10)

find_class <- function(i) {

  classes <- factor(c('A', 'B', 'C'))

  ifelse(i %in% c(1, 3, 5), classes[1], 
         ifelse(i %in% c(2, 4, 9), classes[2], classes[3]))
}

data$class <- find_class(data$a)

Thus data$class is of type int. How to get data$class to be a factor?

Also, since the breaks are not based on a simple value range, I can't use cut (which would work fine).

Upvotes: 2

Views: 1131

Answers (5)

Uwe
Uwe

Reputation: 42564

The latest release of the fct_collapse() function from the forecats package can be used in place of OP's own find_class() function. Please, make sure to install the development version 0.4.0.9000 from GitHub instead of CRAN version 0.4.0 by

devtools::install_github("tidyverse/forcats")

Then,

data$class <- forcats::fct_collapse(as.factor(data$a),  
                                    A = c("1", "3", "5"), B = c("2", "4", "9"),
                                    other_level = "C")
data

returns

    a class
1   1     A
2   2     B
3   3     A
4   4     B
5   5     A
6   6     C
7   7     C
8   8     C
9   9     B
10 10     C
str(data)
'data.frame': 10 obs. of  2 variables:
 $ a    : int  1 2 3 4 5 6 7 8 9 10
 $ class: Factor w/ 3 levels "A","B","C": 1 2 1 2 1 3 3 3 2 3

Another approach is to create a lookup table from a named list:

find_class <- function(i, classes) {
  long <- reshape2::melt(classes)
  as.factor(long$L1[match(data$a, long$value, nomatch = which(is.na(long$value)))])
}

data$class <- find_class(data$a, list(A = c(1, 3, 5), B = c(2, 4, 9), C = NA))
data
    a class
1   1     A
2   2     B
3   3     A
4   4     B
5   5     A
6   6     C
7   7     C
8   8     C
9   9     B
10 10     C
str(data)
'data.frame': 10 obs. of  2 variables:
 $ a    : int  1 2 3 4 5 6 7 8 9 10
 $ class: Factor w/ 3 levels "A","B","C": 1 2 1 2 1 3 3 3 2 3

The advantage is that the classification is not hard-coded but can be passed in a compact way as an additional parameter. Thus, the number of classes can be modified easily without having to deal with nested ifelse().

data$class <- find_class(data$a)
data
    a class
1   1     A
2   2     B
3   3     A
4   4     B
5   5     A
6   6     C
7   7     C
8   8     C
9   9     B
10 10     C
str(data)
'data.frame': 10 obs. of  2 variables:
 $ a    : int  1 2 3 4 5 6 7 8 9 10
 $ class: Factor w/ 3 levels "A","B","C": 1 2 1 2 1 3 3 3 2 3

Upvotes: 0

zheng chandler
zheng chandler

Reputation: 1

I may figure it out. Take a close look at the source code of "ifelse" by running it without brackets. Your will see a segment of code as below:

ans <- test
    len <- length(ans)
    ypos <- which(test)
    npos <- which(!test)
    if (length(ypos) > 0L) 
        ans[ypos] <- rep(yes, length.out = len)[ypos]
    if (length(npos) > 0L) 
        ans[npos] <- rep(no, length.out = len)[npos]
    ans

That is, "ifelse" want the logical vector "ans" to take the value of "rep(yes, length.out = len)[ypos]". However, when the value from "rep()"is a factor, the factor value will/must be coerced to integer, so ifelse did not give what u want.

Possible solution:

find_class <- function(i) {
  classes <- c("A", "B", "C")
  i=1:10
  outcome=ifelse(i %in% c(1, 3, 5), classes[1], 
         ifelse(i %in% c(2, 4, 9), classes[2], classes[3]))
 as.factor(outcome)
}
find_class(data)

this works because a logical vector can take character value and covert itself into a character vector, while the one in your function get coerced to an integer one.

Upvotes: 0

lambruscoAcido
lambruscoAcido

Reputation: 572

One more option - using a general mapping function as parameter:

factorize = function(
  data,
  mapping=function(v) 
    ifelse(v %in% c(1, 3, 5), "A", 
      ifelse(v %in% c(2, 4, 9), "B", "C"))
) {
  as.factor(mapping(data))
}

That gives:

> factorize(1:10)
[1] A B A B A C C C B C
Levels: A B C

And now an option with a mapping vector instead of a mapping function:

factorize = function(
  data,
  mapping=c("1"="A", "2"="B", "3"="A", "4"="B", "5"="A", "9"="B"),
  default="C"
) {
  data = mapping[as.character(data)]
  data[is.na(data)] = default
  names(data) = NULL
  as.factor(data)
}

Upvotes: 0

B. Christian Kamgang
B. Christian Kamgang

Reputation: 6499

You can use the levels of the variable Classes and the output of the ifelse statement as follows:

data <- data.frame(a = 1:10)

find_class <- function(i) {

  classes <- factor(c('A', 'B', 'C'))

  idx <- ifelse(i %in% c(1, 3, 5), classes[1],
                ifelse(i %in% c(2, 4, 9), classes[2], classes[3]))

  res <- levels(classes)[idx]
  factor(res, levels(classes))
}

data$class <- find_class(data$a)

data$class
# [1] A B A B A C C C B C
# Levels: A B C

data
#     a class
# 1   1     A
# 2   2     B
# 3   3     A
# 4   4     B
# 5   5     A
# 6   6     C
# 7   7     C
# 8   8     C
# 9   9     B
# 10 10     C

Upvotes: 1

Andrew Chisholm
Andrew Chisholm

Reputation: 6567

It's the return of ifelse that is causing the problem. If I use case_when from dplyr it works.

library(dplyr)

data <- data.frame(a = 1:10)

find_class <- function(i) {
    classes <- factor(c('A', 'B', 'C'))

    case_when(
        i %in% c(1,3,5) ~ classes[1],
        i %in% c(2,4,9) ~ classes[2],
        TRUE ~ classes[3]
    )
}

data$class <- find_class(data$a)

str(data)

# 'data.frame': 10 obs. of  2 variables:
# $ a    : int  1 2 3 4 5 6 7 8 9 10
# $ class: Factor w/ 3 levels "A","B","C": 1 2 1 2 1 3 3 3 2 3

Upvotes: 2

Related Questions