Reputation: 33960
I have a dataframe of 11 variables x 60 rows; the entries are two-valued factor ('Male'/'Female') and contain NAs. (Each row is a single household, with up to 11 members. Hence the column names 'A4M1'...'A4M11').
What is the simplest paradigm to tabulate the total counts of 'Male','Female' so they can be barplot'ted? Ideally my output would be a length-2 named vector of numeric, i.e. directly operate on the factor as a factor, so we keep the labels.
I've been trying reshape,melt,cast,stack,tabulate,table,colwise,sum,aggregate,summarise,by,plyr...
My current working code is:
> tabulate(stack(colwise(as.numeric)(myData), na.rm=TRUE)$values)
[1] 162 151
which is bad since I don't want to coerce to numeric and lose the labels.
See my answer below
table(as.matrix(myData))
is also not ideal since it loses the order of factor levels, and returns us alphabetical order of labels.
tabulate()
seems to be the most suitable fn. To convert the dataframe to a list I use stack(..., na.rm=TRUE)$values
, which is clunky but does the job.
However stack()
is not ideal since it won't work on factors, so I have to convert it with colwise(as.numeric)(myData)
, which throws away the labels.
I could reapply factor(..., labels=c('Male','Female'))
to the output from tabulate()
, but that's just clunky.
So, can you do better?
(Using a standard paradigm, other than writing case-specific code)
> head(myData)
A4M1 A4M2 A4M3 A4M4 A4M5 A4M6 A4M7 A4M8 A4M9 A4M10 A4M11
1 Female Male Male <NA> Male Male Male <NA> <NA> <NA> <NA>
2 Female Male Male <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 Female Male Male <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
4 Female Male Female <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 Female Male Male Female Female Male <NA> <NA> <NA> <NA> <NA>
6 Female Male Male <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7 ...
> str(myData)
'data.frame': 60 obs. of 11 variables:
$ A4M1 : Factor w/ 2 levels "Male","Female": 2 2 2 2 2 2 1 2 2 2 ...
$ A4M2 : Factor w/ 2 levels "Male","Female": 1 1 1 1 1 1 1 1 1 1 ...
$ A4M3 : Factor w/ 2 levels "Male","Female": 1 1 1 2 1 1 2 1 1 2 ...
$ A4M4 : Factor w/ 2 levels "Male","Female": NA NA NA NA 2 NA NA 1 NA 2 ...
$ A4M5 : Factor w/ 2 levels "Male","Female": 1 NA NA NA 2 NA NA 1 NA 2 ...
$ A4M6 : Factor w/ 2 levels "Male","Female": 1 NA NA NA 1 NA NA NA NA NA ...
$ A4M7 : Factor w/ 2 levels "Male","Female": 1 NA NA NA NA NA NA NA NA NA ...
$ A4M8 : Factor w/ 2 levels "Male","Female": NA NA NA NA NA NA NA NA NA NA ...
$ A4M9 : Factor w/ 2 levels "Male","Female": NA NA NA NA NA NA NA NA NA NA ...
$ A4M10: Factor w/ 2 levels "Male","Female": NA NA NA NA NA NA NA NA NA NA ...
$ A4M11: Factor w/ 2 levels "Male","Female": NA NA NA NA NA NA NA NA NA NA ...
Upvotes: 2
Views: 864
Reputation: 93938
What if you just unlist your data first? Simple example:
test <- data.frame(
A4M1=factor(c(1,2,3,1),levels=(1:3),labels=c("one","two","three")),
A4M2=factor(c(1,2,2,1),levels=(1:3),labels=c("one","two","three"))
)
This keeps the factors intact...
str(unlist(test))
Factor w/ 3 levels "one","two","three": 1 2 3 1 1 2 2 1
- attr(*, "names")= chr [1:8] "A4M11" "A4M12" "A4M13" "A4M14" ...
...and gives you the table order you are after (i.e. - not alphabetical):
> table(unlist(test))
one two three
4 3 1
Upvotes: 1
Reputation: 33960
If you abandon the idea of treating it as a dataframe of factors, you can do the following.
> table(as.matrix(myData))
Female Male
151 162
However this is undesirable since the order is now indeterminate; table()
sorts by alphabetical order of the labels, not numerical order of the levels.
In general that would suck; here we can workaround with rev()
:
> rev(table(as.matrix(myData)))
Male Female
248 253
Upvotes: 0