tom
tom

Reputation: 1077

Efficiently subset tables with factor variables in R

I'm trying to create tables from survey data, but the solution I've come up with isn't manageable for all of the tables I need to create.

I have a survey of different populations, parties, and their opinions on certain issues. Below is the sample data and my (almost) working cumbersome solution. I've included the solution i'm looking for in the "ideal.table" data.frame (shown below)

pop <- c("elite", "elite", "public", "public", "public", "public")
party <- c("D", "R", "R", "D", "D", "R")
opinion <- c("pro", "con", "pro", "con", "pro", "pro")

df <- data.frame(pop, party, opinion)

party.table <- prop.table(table(df[df$pop=="public",][["party"]], df[df$pop=="public",][["opinion"]]),2)
elite.table <- prop.table(table(df[df$pop=="elite",][["opinion"]]))
public.table <- prop.table(table(df[df$pop=="public",][["opinion"]]))

group <- c("R", "D", "elite", "public")
percent.pro <- c(0.3, 0.6, 0.5, 0.75)
percent.con <- c(0.7, 0.4, 0.5, 0.25)

ideal.table <- data.frame(group, percent.pro, percent.con)

library(dplyr)
library(tidyr)

# create data frames from tables
x = data.frame(elite.table)
names(x) = c("elite","value")

y = data.frame(party.table) %>% spread(Var2,Freq)
names(y)[1] = "group"

z = data.frame(public.table)
names(z)[1] = "group"

# join data frames
x %>% inner_join(y, by="group") %>% inner_join(z, by="group")

I haven't figured out a solution for this yet, but even if I find a solution to this particular dataset, sometimes i'm combining multiple tables with two dimensions and more than the groups presented here. Is there a better way to get crosstab proportions for different subsets of data?

   group percent.pro percent.con
1      R        0.30        0.70
2      D        0.60        0.40
3  elite        0.50        0.50
4 public        0.75        0.25

Thanks for any help!

Upvotes: 0

Views: 403

Answers (1)

bramtayl
bramtayl

Reputation: 4024

library(dplyr)
library(tidyr)
df %>%
  gather(variable, group, -opinion) %>%
  group_by(variable, group) %>%
    summarize(percent.pro = sum(opinion == "pro") / n() ) %>%
  mutate(percent.com = 1 - percent.pro)

Upvotes: 1

Related Questions