Gabby S
Gabby S

Reputation: 47

How to count Yes/No responses and group them?

I have a survey from an excel file that looks like this

  party          question1         question2
1 Republican       Yes                No
2 Democrat         No                 Yes
3 Libertarian      No                 No
4 Green            No                 Yes
5 Republican       Yes                Yes
6 Constitution     Yes                No
7 Democrat         Yes                Yes
8 Democrat         No                 No

I want to have the number of Yes and Nos by party, so something like this

party           Yes   No
Republican       5    4
Democrat         1    3
Libertarian      5    6
Constitution     2    4
Green            4    1

Eventually I want to make a graph out of this. I have been looking all over and trying some stuff, the closest one has been

res1 <-as.data.frame(aggregate(question1, list(party), table))

which results in this

             Group.1 x.No x.Yes
1 constitution          2    12
2   democratic         21   267
3        green          4    21
4  libertarian         12    39
5   republican         27   155

which seems great except when I View(res1) it only shows as "5 obs. of 2 variables". I can only see the Group1 column and the x.No column. I need both Yes and No columns so I can graph them.

I also tried the plyr package but it's not working for me, no idea why. I tried doing it in Rstudio via sqldf using some CASE functions but as much as I tried changing it around it would have an error.

As you can tell I am an absolute beginner, I appreciate any help you can give me.

Upvotes: 2

Views: 11277

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269491

Here are solutions fixing the three approaches (sqldf, aggregate, plyr) attempted in the question. We assume that the data frame input is DF as defined reproducibly in the Note at the end.

1) sqldf With sqldf:

library(sqldf)

sqldf("select party, 
              sum(question1 = 'No') + sum(question2 = 'No') as No,
              sum(question1 = 'Yes') + sum(question2 = 'Yes') as Yes
       from DF
       group by party")

or if you have more than 2 questions dynamically create the SQL statement. The verbose= argument will show the statement it actually sends to SQLite and you can omit it if you don't need that.

library(sqldf)

yes <- paste(sprintf("sum(%s = 'Yes')", names(DF)[-1]), collapse = " + ")
no <- paste(sprintf("sum(%s = 'No')", names(DF)[-1]), collapse = " + ")

fn$sqldf("select party, $no No, $yes Yes from DF group by party", verbose = TRUE)

2) aggregate To do it with aggregate try the following. The aggregate statement creates a two column data frame whose second column is a multicolumn matrix and the last statement, which is optional, converts that to an ordinary 3 column data frame. This also works if there are more than 2 questions.

ag <- aggregate(list(Answer = 1:nrow(DF)), DF["party"], 
  function(i) c(No = sum(DF[i, -1] == 'No'), Yes = sum(DF[i, -1] == 'Yes')))
do.call("data.frame", ag)

or alternately:

yesNo <- data.frame(Yes = rowSums(DF[-1] == "Yes"), No = rowSums(DF[-1] == "No"))
aggregate(yesNo, DF[1], sum)

3) plyr With the plyr package we can use this code:

library(plyr)

ddply(DF, .(party), summarize,
   No = sum(question1 == 'No') + sum(question2 == 'No'),
   Yes = sum(question1 == 'Yes') + sum(question2 == 'Yes'))

or if there can be more than 2 questions:

Count_No <- function(data) sum(data[, -1] == "No")
Count_Yes <- function(data) sum(data[, -1] == "Yes")
ddply(DF, .(party), c(No = Count_No, Yes = Count_Yes))

or alternately using yesNo from (2):

ddply(yesNo, .(party = DF$party), colSums)

Note

The input DF in reproducible form is:

Lines <- "
  party          question1         question2
1 Republican       Yes                No
2 Democrat         No                 Yes
3 Libertarian      No                 No
4 Green            No                 Yes
5 Republican       Yes                Yes
6 Constitution     Yes                No
7 Democrat         Yes                Yes
8 Democrat         No                 No"
DF <- read.table(text = Lines)

Upvotes: 2

akrun
akrun

Reputation: 887048

We can do this with tidyverse by converting to 'long' format with gather, the get the frequency with 'count' and spread it to 'wide' format

library(tidyverse)
gather(df1, key, val, question1:question2) %>%
   count(party, val) %>%
   spread(val, n)

Also, using base R

table(data.frame(df1[1], value = unlist(df1[-1])))

Upvotes: 2

Rui Barradas
Rui Barradas

Reputation: 76402

Here is a way with xtabs, after reformating the data.

long <- reshape2::melt(df1, id.vars = "party")
xtabs( ~ party + value, long)
#              value
#party          No Yes
#  Constitution  1   1
#  Democrat      3   3
#  Green         1   1
#  Libertarian   2   0
#  Republican    1   3

Data.

df1 <- read.table(text = "
party          question1         question2
1 Republican       Yes                No
2 Democrat         No                 Yes
3 Libertarian      No                 No
4 Green            No                 Yes
5 Republican       Yes                Yes
6 Constitution     Yes                No
7 Democrat         Yes                Yes
8 Democrat         No                 No                  
", header = TRUE)

Upvotes: 2

Related Questions