smci
smci

Reputation: 33960

How to tabulate sums of factors from a dataframe, keeping labels

I have a dataframe of 11 variables x 60 rows; the entries are two-valued factor ('Male'/'Female') and contain NAs. (Each row is a single household, with up to 11 members. Hence the column names 'A4M1'...'A4M11').

What is the simplest paradigm to tabulate the total counts of 'Male','Female' so they can be barplot'ted? Ideally my output would be a length-2 named vector of numeric, i.e. directly operate on the factor as a factor, so we keep the labels.

I've been trying reshape,melt,cast,stack,tabulate,table,colwise,sum,aggregate,summarise,by,plyr...

My current working code is:

> tabulate(stack(colwise(as.numeric)(myData), na.rm=TRUE)$values)
[1] 162 151

which is bad since I don't want to coerce to numeric and lose the labels.

See my answer below

table(as.matrix(myData))

is also not ideal since it loses the order of factor levels, and returns us alphabetical order of labels.

tabulate() seems to be the most suitable fn. To convert the dataframe to a list I use stack(..., na.rm=TRUE)$values, which is clunky but does the job. However stack() is not ideal since it won't work on factors, so I have to convert it with colwise(as.numeric)(myData), which throws away the labels. I could reapply factor(..., labels=c('Male','Female')) to the output from tabulate(), but that's just clunky. So, can you do better? (Using a standard paradigm, other than writing case-specific code)

> head(myData)
    A4M1 A4M2   A4M3   A4M4   A4M5 A4M6 A4M7 A4M8 A4M9 A4M10 A4M11
1 Female Male   Male   <NA>   Male Male Male <NA> <NA>  <NA>  <NA>
2 Female Male   Male   <NA>   <NA> <NA> <NA> <NA> <NA>  <NA>  <NA>
3 Female Male   Male   <NA>   <NA> <NA> <NA> <NA> <NA>  <NA>  <NA>
4 Female Male Female   <NA>   <NA> <NA> <NA> <NA> <NA>  <NA>  <NA>
5 Female Male   Male Female Female Male <NA> <NA> <NA>  <NA>  <NA>
6 Female Male   Male   <NA>   <NA> <NA> <NA> <NA> <NA>  <NA>  <NA>
7 ...

> str(myData)
'data.frame':   60 obs. of  11 variables:
 $ A4M1 : Factor w/ 2 levels "Male","Female": 2 2 2 2 2 2 1 2 2 2 ...
 $ A4M2 : Factor w/ 2 levels "Male","Female": 1 1 1 1 1 1 1 1 1 1 ...
 $ A4M3 : Factor w/ 2 levels "Male","Female": 1 1 1 2 1 1 2 1 1 2 ...
 $ A4M4 : Factor w/ 2 levels "Male","Female": NA NA NA NA 2 NA NA 1 NA 2 ...
 $ A4M5 : Factor w/ 2 levels "Male","Female": 1 NA NA NA 2 NA NA 1 NA 2 ...
 $ A4M6 : Factor w/ 2 levels "Male","Female": 1 NA NA NA 1 NA NA NA NA NA ...
 $ A4M7 : Factor w/ 2 levels "Male","Female": 1 NA NA NA NA NA NA NA NA NA ...
 $ A4M8 : Factor w/ 2 levels "Male","Female": NA NA NA NA NA NA NA NA NA NA ...
 $ A4M9 : Factor w/ 2 levels "Male","Female": NA NA NA NA NA NA NA NA NA NA ...
 $ A4M10: Factor w/ 2 levels "Male","Female": NA NA NA NA NA NA NA NA NA NA ...
 $ A4M11: Factor w/ 2 levels "Male","Female": NA NA NA NA NA NA NA NA NA NA ...

Upvotes: 2

Views: 864

Answers (2)

thelatemail
thelatemail

Reputation: 93938

What if you just unlist your data first? Simple example:

test <- data.frame( 
            A4M1=factor(c(1,2,3,1),levels=(1:3),labels=c("one","two","three")),
            A4M2=factor(c(1,2,2,1),levels=(1:3),labels=c("one","two","three"))
             )

This keeps the factors intact...

str(unlist(test))
 Factor w/ 3 levels "one","two","three": 1 2 3 1 1 2 2 1
 - attr(*, "names")= chr [1:8] "A4M11" "A4M12" "A4M13" "A4M14" ...

...and gives you the table order you are after (i.e. - not alphabetical):

> table(unlist(test))

  one   two three 
    4     3     1 

Upvotes: 1

smci
smci

Reputation: 33960

If you abandon the idea of treating it as a dataframe of factors, you can do the following.

> table(as.matrix(myData))

Female   Male 
   151    162 

However this is undesirable since the order is now indeterminate; table() sorts by alphabetical order of the labels, not numerical order of the levels. In general that would suck; here we can workaround with rev() :

> rev(table(as.matrix(myData)))

Male Female 
 248    253 

Upvotes: 0

Related Questions