Gregg H
Gregg H

Reputation: 178

How can you get table(·) in R to output column names in factor order instead of alphabetic order?

I have a variable to be interpreted as 1:3 = c("M","F","NB"). (This is a column in a large data frame.) When I convert this with

df$gend <- factor(df$Q99,labels=c("M","F","NB"),levels=1:3)

it processes fine. But when I use apply(·) to the entire data.frame with FUN=table, it reports the results ordered alphabetically instead of by factor-label ordering.

But, when I try to replicate this in a stand-alone toy data set:

table(factor(c(1,1,1,1,1,2,3,3,3),labels=c("M","F","NB"),levels=1:3))

the result is as would be expected, in the order of M, F, NB.

I have tried to read thru the help for ?table, and I cannot figure out how it is deciding to order (or not order) the output for the frequency table.

If there is an argument for either table(·) or apply(·), I would love to know what it might be.

Upvotes: 1

Views: 175

Answers (1)

akrun
akrun

Reputation: 887691

If the OP used apply with MARGIN =2, it will transform the data.frame to matrix and the factor would change it to character. Instead, if we want to apply the table on the whole dataset, use lapply

lapply(df, table)

Reproducible example

set.seed(24)
df <- as.data.frame(matrix(sample(letters[1:6], 10 *5, replace = TRUE), 10, 5))
df[] <- lapply(df, factor, levels = letters[1:6])

Now, we test with apply and lapply

apply(df, 2, table)
lapply(df, table)

The reason is that apply is converting to matrix behind the hood (source code of apply)

...

if (is.object(X)) 
        X <- if (dl == 2L) 
            as.matrix(X)
        else as.array(X)
        
 ...

According to ?apply,

If X is not an array but an object of a class with a non-null dim value (such as a data frame), apply attempts to coerce it to an array via as.matrix if it is two-dimensional (e.g., a data frame) or via as.array.

and this result in removing the attributes to change the column class to character

apply(df, 2, class)
#         V1          V2          V3          V4          V5 
#"character" "character" "character" "character" "character" 

whereas

lapply(df, class)
#$V1
#[1] "factor"

#$V2
#[1] "factor"

#$V3
#[1] "factor"

#$V4
#[1] "factor"

#$V5
#[1] "factor"

Upvotes: 1

Related Questions