How to tabulate sums of factors from a dataframe, keeping labels

Question

I have a dataframe of 11 variables x 60 rows; the entries are two-valued factor ('Male'/'Female') and contain NAs. (Each row is a single household, with up to 11 members. Hence the column names 'A4M1'...'A4M11').

What is the simplest paradigm to tabulate the total counts of 'Male','Female' so they can be barplot'ted? Ideally my output would be a length-2 named vector of numeric, i.e. directly operate on the factor as a factor, so we keep the labels.

I've been trying reshape,melt,cast,stack,tabulate,table,colwise,sum,aggregate,summarise,by,plyr...

My current working code is:

> tabulate(stack(colwise(as.numeric)(myData), na.rm=TRUE)$values)
[1] 162 151

which is bad since I don't want to coerce to numeric and lose the labels.

See my answer below

table(as.matrix(myData))

is also not ideal since it loses the order of factor levels, and returns us alphabetical order of labels.

tabulate() seems to be the most suitable fn. To convert the dataframe to a list I use stack(..., na.rm=TRUE)$values, which is clunky but does the job. However stack() is not ideal since it won't work on factors, so I have to convert it with colwise(as.numeric)(myData), which throws away the labels. I could reapply factor(..., labels=c('Male','Female')) to the output from tabulate(), but that's just clunky. So, can you do better? (Using a standard paradigm, other than writing case-specific code)

> head(myData)
    A4M1 A4M2   A4M3   A4M4   A4M5 A4M6 A4M7 A4M8 A4M9 A4M10 A4M11
1 Female Male   Male      Male Male Male      
2 Female Male   Male              
3 Female Male   Male              
4 Female Male Female              
5 Female Male   Male Female Female Male       
6 Female Male   Male              
7 ...

> str(myData)
'data.frame':   60 obs. of  11 variables:
 $ A4M1 : Factor w/ 2 levels "Male","Female": 2 2 2 2 2 2 1 2 2 2 ...
 $ A4M2 : Factor w/ 2 levels "Male","Female": 1 1 1 1 1 1 1 1 1 1 ...
 $ A4M3 : Factor w/ 2 levels "Male","Female": 1 1 1 2 1 1 2 1 1 2 ...
 $ A4M4 : Factor w/ 2 levels "Male","Female": NA NA NA NA 2 NA NA 1 NA 2 ...
 $ A4M5 : Factor w/ 2 levels "Male","Female": 1 NA NA NA 2 NA NA 1 NA 2 ...
 $ A4M6 : Factor w/ 2 levels "Male","Female": 1 NA NA NA 1 NA NA NA NA NA ...
 $ A4M7 : Factor w/ 2 levels "Male","Female": 1 NA NA NA NA NA NA NA NA NA ...
 $ A4M8 : Factor w/ 2 levels "Male","Female": NA NA NA NA NA NA NA NA NA NA ...
 $ A4M9 : Factor w/ 2 levels "Male","Female": NA NA NA NA NA NA NA NA NA NA ...
 $ A4M10: Factor w/ 2 levels "Male","Female": NA NA NA NA NA NA NA NA NA NA ...
 $ A4M11: Factor w/ 2 levels "Male","Female": NA NA NA NA NA NA NA NA NA NA ...

thelatemail · Accepted Answer

What if you just unlist your data first? Simple example:

test <- data.frame( 
            A4M1=factor(c(1,2,3,1),levels=(1:3),labels=c("one","two","three")),
            A4M2=factor(c(1,2,2,1),levels=(1:3),labels=c("one","two","three"))
             )

This keeps the factors intact...

str(unlist(test))
 Factor w/ 3 levels "one","two","three": 1 2 3 1 1 2 2 1
 - attr(*, "names")= chr [1:8] "A4M11" "A4M12" "A4M13" "A4M14" ...

...and gives you the table order you are after (i.e. - not alphabetical):

> table(unlist(test))

  one   two three 
    4     3     1

How to tabulate sums of factors from a dataframe, keeping labels

Answers (2)

Related Questions