Reputation: 435
Basic dplyr question... Respondents could select multiple companies that they use. For example:
library(dplyr)
test <- tibble(
CompanyA = rep(c(0:1),5),
CompanyB = rep(c(1),10),
CompanyC = c(1,1,1,1,0,0,1,1,1,1)
)
test
If it were a forced-choice question - i.e., respondents could make only one selection - I would do the following for a basic summary table:
test %>%
summarise_all(funs(sum), na.rm = TRUE) %>%
gather(Response, n) %>%
arrange(desc(n)) %>%
mutate("%" = round(100*n/sum(n)))
Note, however, that the "%" column is not what I want. I'm instead looking for the proportion of total respondents for each individual response option (since they could make multiple selections).
I've tried adding mutate(totalrows = nrow(.)) %>%
prior to the summarise_all
command. This would allow me to use that variable as the denominator in a later mutate
command. However, summarise_all
eliminates the "totalrows" var.
Also, if there's a better way to do this, I'm open to ideas.
Upvotes: 1
Views: 425
Reputation: 6778
To get the proportion of respondents who chose an option when that variable is binary, you can take the mean. To do this with your test data, you can use sapply
:
sapply(test, mean)
CompanyA CompanyB CompanyC
0.5 1.0 0.8
If you wanted to do this in a more complicated fashion (say your data is not binary encoded, but is stored as 1
and 2
instead), you could do that with the following:
test %>%
gather(key='Company') %>%
group_by(Company) %>%
summarise(proportion = sum(value == 1) / n())
# A tibble: 3 x 2
Company proportion
<chr> <dbl>
1 CompanyA 0.5
2 CompanyB 1
3 CompanyC 0.8
Upvotes: 3
Reputation: 865
If you put all functions in a list within summarise, then this will work. You'll need to do some quick tidying up after though.
test %>%
summarise_all(
list(
rows = length,
n = function(x){sum(x, na.rm = T)},
perc = function(x){sum(x,na.rm = T)/length(x)}
)) %>%
tidyr::gather(Response, n) %>%
tidyr::separate(Response, c("Company", "Metric"), '_') %>%
tidyr::spread(Metric, n)
And you'll get this
Company n perc rows
<chr> <dbl> <dbl> <dbl>
1 CompanyA 5 0.5 10
2 CompanyB 10 1 10
3 CompanyC 8 0.8 10
Upvotes: 1
Reputation: 3726
Here is a solution using tidyr::gather
:
test %>%
gather(Company, response) %>%
group_by(Company) %>%
summarise(`%` = 100 * sum(response) / n())
Upvotes: 0