Reputation: 3488
My goal is to get a table that, for a list of categorical variables, returns (from left most column to right most column): the categorical variable name, the categorical variable level, the frequency for the first level of a binary grouping variable, the frequency for the second level of a binary grouping variable, the chi-squared test stat, the p.value, and the testing method. Example of the output I want is presented at the very bottom of the page. The current output and code is for a single categorical variable. I'm trying not to put the horse before the carriage. Right now getting the right format for a single variable will be good. I'll work on getting it to do it for a string and then rbind them together after that.
The code presents what I could figure out thus far. I'm fairly certain there is an easier way to do this. I've been told about tables::tabular, but could get that to do exactly what I wanted. I currently can't figure out the reshape (and then how to get rid of duplicates in the final three columns once that worked, but I'm not there yet).
Any help using the current code, or a different method, would be very much appreciated.
#make data (I couldn't get return() to work, so I used <<)
get.data<-function(){
set.seed(1)
cat1 <-sample(c(1,2), 100, replace=T)
cont1<-rnorm(100, 25, 8)
cont2<-rnorm(100, 0, 1)
cont3<-rnorm(100, 6, 14.23)
cont4<-rnorm(100, 25, 8)*runif(5, 0.1, 1)
cat2<-sample(c(1,2,3,4),100,replace=TRUE)
cat3<-sample(c(1,2,3,4,5),100,replace=TRUE)
cat4<-sample(c("Caucasian","African American", "Latino", "Multi-Racial", "No
Response"),100,replace=TRUE)
group<-sample(c(0,1), 100, replace=T)
sex<-sample(c("male", "female"), 100, replace=T)
one <<-data.frame(group, sex,cat1, cont1, cont2, cont3, cont4,cat2,cat3,cat4)
}
get.data()
#getting the two bits of data I would like
attach(one)
long <- (with(one, table(cat2,group)))
test<-with(one, chisq.test(cat2,group))
kk<-c(test$statistic,test$p.value,test$method)
detach(one)
#merging them together
res<-merge(as.data.frame(as.matrix(long)), as.data.frame(as.matrix(kk)),
all=TRUE, sort=FALSE)
#unsuccessfully reshaping the data
wider <- reshape(as.data.frame(res), idvar = cat2,
timevar = "V1", direction = "wide")
Here is what the output from 'res' looks like:
# cat2 group Freq V1
#1 1 0 17 1.16345446805217
#2 2 0 11 1.16345446805217
#3 3 0 13 1.16345446805217
#4 4 0 13 1.16345446805217
#5 1 1 12 1.16345446805217
#6 2 1 13 1.16345446805217
#7 3 1 9 1.16345446805217
#8 4 1 12 1.16345446805217
#9 1 0 17 0.761782111152171
#10 2 0 11 0.761782111152171
#11 3 0 13 0.761782111152171
#12 4 0 13 0.761782111152171
#13 1 1 12 0.761782111152171
#14 2 1 13 0.761782111152171
#15 3 1 9 0.761782111152171
#16 4 1 12 0.761782111152171
#17 1 0 17 Pearson's Chi-squared test
#18 2 0 11 Pearson's Chi-squared test
#19 3 0 13 Pearson's Chi-squared test
#20 4 0 13 Pearson's Chi-squared test
#21 1 1 12 Pearson's Chi-squared test
#22 2 1 13 Pearson's Chi-squared test
#23 3 1 9 Pearson's Chi-squared test
#24 4 1 12 Pearson's Chi-squared test
HERE IS WHAT I WANT THE OUTPUT TO LOOK LIKE:
Variable Response Group1.Freq Group2.Freq Test.Stat p.value method
Cat2 1 17 12 1.16 0.761 Pearson's Chi...
2 11 13
3 13 9
4 13 12
NEW ISSUE: I used Ram's suggestion to make a function so that I could make a data.frame for multiple categorical variables. I came up with this code. But the output messed up during the rbind and lapply steps. I'm wondering how to go about fixing this issue. Again, output is at the bottom.
get.data<-function(){
set.seed(1)
cat1 <-sample(c(1,2), 100, replace=T)
cont1<-rnorm(100, 25, 8)
cont2<-rnorm(100, 0, 1)
cont3<-rnorm(100, 6, 14.23)
cont4<-rnorm(100, 25, 8)*runif(5, 0.1, 1)
cat2<-sample(c(1,2,3,4),100,replace=TRUE)
cat3<-sample(c(1,2,3,4,5),100,replace=TRUE)
cat4<-sample(c("Caucasian","African American", "Latino", "Multi-Racial", "No
Response"),100,replace=TRUE)
group<-sample(c(0,1), 100, replace=T)
sex<-sample(c("male", "female"), 100, replace=T)
one <<-data.frame(group, sex,cat1, cont1, cont2, cont3, cont4,cat2,cat3,cat4)
}
get.data()
make.table<-function(catvars,group,data){
attach(data)
get.chi.stuff<-function(cat, group){
long <- table(cat,group)
test<-chisq.test(cat,group)
kk<-c(test$statistic,test$p.value,test$method)
res <- data.frame(matrix(NA,nrow(long),7))
names(res) <- c("Variable", "Response", "Group1.Freq", "Group2.Freq",
"Test.Stat", "p.value", "method")
res[1,1] <- deparse(substitute(cat))
res[,2] <- row.names(long)
res[,3:4] <- long[,1:2]
res[1,5:7] <- kk
return(res)
}
tables<<-do.call(rbind,lapply(data[,catvars],get.chi.stuff,group=group))
detach(data)
}
make.table(catvars=catvars,group=group, data=one)
OUTPUT (It's currently not formatting like it should, but the issue is row.names and Variable. The rest looks fine)
row.names Variable Response Group1.Freq Group2.Freq Test.Stat p.value method
cat2.1 X[[1L]] 1 17 12 1.16345446805217 0.761782111152171 Pearson's Chi-squared test
cat2.2 NA 2 11 13 NA NA NA
cat2.3 NA 3 13 9 NA NA NA
cat2.4 NA 4 13 12 NA NA NA
cat3.1 X[[2L]] 1 8 15 5.68288366946583 0.224115426983988 Pearson's Chi-squared test
6 cat3.2 NA 2 10 7 NA NA NA
7 cat3.3 NA 3 14 11 NA NA NA
8 cat3.4 NA 4 8 7 NA NA NA
9 cat3.5 NA 5 14 6 NA NA NA
10 cat4.1 X[[3L]] African American 9 18 8.73180996607079 0.0681639164530817 Pearson's Chi-squared test
11 cat4.2 NA Caucasian 14 5 NA NA NA
12 cat4.3 NA Latino 6 7 NA NA NA
13 cat4.4 NA Multi-Racial 14 9 NA NA NA
14 cat4.5 NA No
Response 11 7 NA NA NA
15 sex.1 X[[4L]] female 30 17 2.74327353028067 0.0976645121155453 Pearson's Chi-squared test with Yates' continuity correction
16 sex.2 NA male 24 29 NA NA NA
Upvotes: 1
Views: 751
Reputation: 22506
Since you are using merge
it creates a data frame with recycling, which is not what you want for your res
You have created all the components you want in your res
in your variables, long
, kk
and test
. So now it is a matter of stitching it all together in the specific format that you want.
This is not very elegant, because we are constructing the desired results by hand, column by column. You could throw all of this into a function.
res <- data.frame(matrix(NA,nrow(long),7))
names(res) <- c("Variable", "Response", "Group1.Freq", "Group2.Freq",
"Test.Stat", "p.value", "method")
res[1,1] <- names(attr(test$observed, "dimnames")[1])
res[,2] <- row.names(long)
res[,3:4] <- long[,1:2]
res[1,5:7] <- kk
res
# Variable Response Group1.Freq Group2.Freq Test.Stat
# 1 cat2 1 17 12 1.16345446805217
# 2 <NA> 2 11 13 <NA>
# 3 <NA> 3 13 9 <NA>
# 4 <NA> 4 13 12 <NA>
# p.value method
# 1 0.761782111152171 Pearson's Chi-squared test
# 2 <NA> <NA>
# 3 <NA> <NA>
# 4 <NA> <NA>
Upvotes: 1