Reputation: 23
I have a large data frame with many variables. Many are likert scale answers and schools which observations belong to they are logic variables (and can include overlap).
Example:
Q1 <- c(1,2,2,4,3,5)
Q2 <- c(3,4,3,5,4,5)
A <- c(TRUE,FALSE,TRUE,TRUE,FALSE,TRUE)
B <- c(FALSE,TRUE,FALSE,TRUE,FALSE,FALSE)
df <- data.frame(Q1,Q2, A, B)
The output I want is a contingency table :
Q1
1 2 3 4 5
A 1 1 0 1 1
B 0 1 0 1 0
where I can do a chi2
test between schools - here A
and B
. Nothing I have tried works.
I think there maybe answer in what I have read online but I lack the knowledge to recognize it!
Upvotes: 2
Views: 1341
Reputation: 887068
We can use dplyr/tidyr
. We group by 'Q1', get the sum
of 'A', 'B' columns using summarise_each
, convert the 'wide' to 'long' format with gather
and reshape it back to 'wide' with `spread.
library(dplyr)
library(tidyr)
df %>%
group_by(Q1) %>%
summarise_each(funs(sum(.)), A:B) %>%
gather(Var, Val,-Q1) %>%
spread(Q1, Val)
# Var 1 2 3 4 5
# (fctr) (int) (int) (int) (int) (int)
# 1 A 1 1 0 1 1
# 2 B 0 1 0 1 0
A base R
option is xtabs
after converting to long
format
d1 <- data.frame(Q1= rep(Q1,2), Var= rep(names(df)[3:4],
each=nrow(df)), Val=unlist(df[3:4]))
xtabs(Val~Var+Q1, d1)
# Q1
#Var 1 2 3 4 5
# A 1 1 0 1 1
# B 0 1 0 1 0
Upvotes: 2