Reputation: 178
I have a variable to be interpreted as 1:3 = c("M","F","NB"). (This is a column in a large data frame.) When I convert this with
df$gend <- factor(df$Q99,labels=c("M","F","NB"),levels=1:3)
it processes fine. But when I use apply(·)
to the entire data.frame with FUN=table
, it reports the results ordered alphabetically instead of by factor-label ordering.
But, when I try to replicate this in a stand-alone toy data set:
table(factor(c(1,1,1,1,1,2,3,3,3),labels=c("M","F","NB"),levels=1:3))
the result is as would be expected, in the order of M, F, NB.
I have tried to read thru the help for ?table
, and I cannot figure out how it is deciding to order (or not order) the output for the frequency table.
If there is an argument for either table(·)
or apply(·)
, I would love to know what it might be.
Upvotes: 1
Views: 175
Reputation: 887691
If the OP used apply
with MARGIN =2
, it will transform the data.frame to matrix
and the factor
would change it to character
. Instead, if we want to apply the table
on the whole dataset, use lapply
lapply(df, table)
Reproducible example
set.seed(24)
df <- as.data.frame(matrix(sample(letters[1:6], 10 *5, replace = TRUE), 10, 5))
df[] <- lapply(df, factor, levels = letters[1:6])
Now, we test with apply
and lapply
apply(df, 2, table)
lapply(df, table)
The reason is that apply
is converting to matrix
behind the hood (source code of apply
)
...
if (is.object(X))
X <- if (dl == 2L)
as.matrix(X)
else as.array(X)
...
According to ?apply
,
If X is not an array but an object of a class with a non-null dim value (such as a data frame), apply attempts to coerce it to an array via as.matrix if it is two-dimensional (e.g., a data frame) or via as.array.
and this result in removing the attributes to change the column class
to character
apply(df, 2, class)
# V1 V2 V3 V4 V5
#"character" "character" "character" "character" "character"
whereas
lapply(df, class)
#$V1
#[1] "factor"
#$V2
#[1] "factor"
#$V3
#[1] "factor"
#$V4
#[1] "factor"
#$V5
#[1] "factor"
Upvotes: 1