Reputation: 47
I have a survey from an excel file that looks like this
party question1 question2
1 Republican Yes No
2 Democrat No Yes
3 Libertarian No No
4 Green No Yes
5 Republican Yes Yes
6 Constitution Yes No
7 Democrat Yes Yes
8 Democrat No No
I want to have the number of Yes and Nos by party, so something like this
party Yes No
Republican 5 4
Democrat 1 3
Libertarian 5 6
Constitution 2 4
Green 4 1
Eventually I want to make a graph out of this. I have been looking all over and trying some stuff, the closest one has been
res1 <-as.data.frame(aggregate(question1, list(party), table))
which results in this
Group.1 x.No x.Yes
1 constitution 2 12
2 democratic 21 267
3 green 4 21
4 libertarian 12 39
5 republican 27 155
which seems great except when I View(res1)
it only shows as "5 obs. of 2 variables"
. I can only see the Group1 column and the x.No column.
I need both Yes and No columns so I can graph them.
I also tried the plyr
package but it's not working for me, no idea why.
I tried doing it in Rstudio via sqldf using some CASE
functions but as much as I tried changing it around it would have an error.
As you can tell I am an absolute beginner, I appreciate any help you can give me.
Upvotes: 2
Views: 11277
Reputation: 269491
Here are solutions fixing the three approaches (sqldf, aggregate, plyr) attempted in the question. We assume that the data frame input is DF
as defined reproducibly in the Note at the end.
1) sqldf With sqldf:
library(sqldf)
sqldf("select party,
sum(question1 = 'No') + sum(question2 = 'No') as No,
sum(question1 = 'Yes') + sum(question2 = 'Yes') as Yes
from DF
group by party")
or if you have more than 2 questions dynamically create the SQL statement. The verbose=
argument will show the statement it actually sends to SQLite and you can omit it if you don't need that.
library(sqldf)
yes <- paste(sprintf("sum(%s = 'Yes')", names(DF)[-1]), collapse = " + ")
no <- paste(sprintf("sum(%s = 'No')", names(DF)[-1]), collapse = " + ")
fn$sqldf("select party, $no No, $yes Yes from DF group by party", verbose = TRUE)
2) aggregate To do it with aggregate try the following. The aggregate statement creates a two column data frame whose second column is a multicolumn matrix and the last statement, which is optional, converts that to an ordinary 3 column data frame. This also works if there are more than 2 questions.
ag <- aggregate(list(Answer = 1:nrow(DF)), DF["party"],
function(i) c(No = sum(DF[i, -1] == 'No'), Yes = sum(DF[i, -1] == 'Yes')))
do.call("data.frame", ag)
or alternately:
yesNo <- data.frame(Yes = rowSums(DF[-1] == "Yes"), No = rowSums(DF[-1] == "No"))
aggregate(yesNo, DF[1], sum)
3) plyr With the plyr package we can use this code:
library(plyr)
ddply(DF, .(party), summarize,
No = sum(question1 == 'No') + sum(question2 == 'No'),
Yes = sum(question1 == 'Yes') + sum(question2 == 'Yes'))
or if there can be more than 2 questions:
Count_No <- function(data) sum(data[, -1] == "No")
Count_Yes <- function(data) sum(data[, -1] == "Yes")
ddply(DF, .(party), c(No = Count_No, Yes = Count_Yes))
or alternately using yesNo
from (2):
ddply(yesNo, .(party = DF$party), colSums)
The input DF
in reproducible form is:
Lines <- "
party question1 question2
1 Republican Yes No
2 Democrat No Yes
3 Libertarian No No
4 Green No Yes
5 Republican Yes Yes
6 Constitution Yes No
7 Democrat Yes Yes
8 Democrat No No"
DF <- read.table(text = Lines)
Upvotes: 2
Reputation: 887048
We can do this with tidyverse
by converting to 'long' format with gather
, the get the frequency with 'count' and spread
it to 'wide' format
library(tidyverse)
gather(df1, key, val, question1:question2) %>%
count(party, val) %>%
spread(val, n)
Also, using base R
table(data.frame(df1[1], value = unlist(df1[-1])))
Upvotes: 2
Reputation: 76402
Here is a way with xtabs
, after reformating the data.
long <- reshape2::melt(df1, id.vars = "party")
xtabs( ~ party + value, long)
# value
#party No Yes
# Constitution 1 1
# Democrat 3 3
# Green 1 1
# Libertarian 2 0
# Republican 1 3
Data.
df1 <- read.table(text = "
party question1 question2
1 Republican Yes No
2 Democrat No Yes
3 Libertarian No No
4 Green No Yes
5 Republican Yes Yes
6 Constitution Yes No
7 Democrat Yes Yes
8 Democrat No No
", header = TRUE)
Upvotes: 2