Apricot
Apricot

Reputation: 3011

How to get levels for each factor variable in R

I understand R assigns values to a factor vector alphabetically. In this following example:

x <- as.factor(c("A","B","C","A","A","A","A","A","A","B","C","B","C","B","C","B","C"))
str(x)

This prints

Factor w/ 3 levels "A","B","C": 1 2 3 1 1 1 1 1 1 2 ...

Since I have only three levels it is easier to understand the level - value association i.e., A = 1, B = 2, so on and so forth.

In a scenario where I have hundreds of factors, is there a easier way to get it printed as a table that displays all the factors along with it level values like this:

Levels  Values
A        1
B        2
C        3

Upvotes: 2

Views: 4380

Answers (1)

eipi10
eipi10

Reputation: 93761

Why do you want to know the underlying numeric values that R assigns to each factor level? I ask because this generally wouldn't be an important thing to keep track of. Can you say more about what you're trying to accomplish? We may be able to provide additional advice if we know more about the underlying problem you're trying to solve. For now, below are examples of how to do what you ask that also show why the results might not be what you expect.

Do all the columns in your data frame have different combinations of the same underlying categories? If not, what you're asking for could give unexpected and undesirable results. Below are a couple of examples, based on a fake data frame with 3 factor columns, two of which are upper case letters and one of which is lower case letters.

# Fake data
set.seed(2)
x = c("C","A","B","C","A","A","A","A","A","A","B","C","B","C","B","C","B","C")
dat = data.frame(x=x,
                 y=sample(LETTERS[1:5], length(x), replace=TRUE),
                 z=sample(letters[1:3], length(x), replace=TRUE),
                 w=rnorm(length(x)))

Note that the numeric codes assigned to each factor level are not unique across columns. The lower case letters and the upper case letters can both have factor codes 1 through 3.

# Return a list with factor levels and numeric codes for each factor column
lapply(dat[ , sapply(dat, is.factor)], function(v) {
  data.frame(Levels=levels(unique(sort(v))),
             Values=as.numeric(unique(sort(v))))
  })
$x
  Levels Values
1      A      1
2      B      2
3      C      3

$y
  Levels Values
1      A      1
2      B      2
3      C      3
4      D      4
5      E      5

$z
  Levels Values
1      a      1
2      b      2
3      c      3

Another potential complication is whether the order of the factor levels is the same for different columns. As an example, let's change the factor order for one of the upper case columns. This creates a new issue in that the the same letter can have a different code value in different columns and the same code can be assigned to different letters. For example, A has code 1 in column x and code 5 in column y. Furthermore, code 1 is assigned to E in column y, rather than to A.

dat$y = factor(dat$y, levels = LETTERS[5:1]) 

# Return a list with factor levels and numeric codes for each factor column
lapply(dat[ , sapply(dat, is.factor)], function(v) {
  data.frame(Levels=levels(unique(sort(v))),
             Values=as.numeric(unique(sort(v))))
})
$x
  Levels Values
1      A      1
2      B      2
3      C      3

$y
  Levels Values
1      E      1
2      D      2
3      C      3
4      B      4
5      A      5

$z
  Levels Values
1      a      1
2      b      2
3      c      3

Upvotes: 3

Related Questions