Reputation: 451
So I have some data that is formatted like so:
header1 header2
"nocandy" "nocandy"
"nocandy" "nocandy"
"nocandy" "nocandy"
"nocandy" "candy"
"nocandy" "candy"
"candy" "candy"
etc...
I imported it with candytext <- read.table("candytest.txt", header=TRUE)
And I want to do a chi-squared test to see if there is a difference between the two groups.
When I use the function table(candytest)
I get something like this:
header2
header1 candy nocandy
candy 112 39
nocandy 4 82
But if I run summary(candytest)
I get something like this:
header1 header2
candy :151 candy :116
nocandy: 86 nocandy:121
As you can see the two tables are formatted differently. However, I can run a chisquared test on the first table but not the second. However the summary table is more like the table I would need to use to do a chisq.test()
on. The second table looks like it's assuming that the data is paired, but the data is not paired. If it was paired it would be fine and I could use McNemars test on the output of table(candytest)
, but it's not paired. So how do I create a 2 by 2 matrix that looks like the summary table, without typing it out by hand. I realise I could copy the summary table into a matrix, however I want to know how to convert it in R with functions properly.
Thank you!
Upvotes: 2
Views: 2404
Reputation: 887911
Here, I am trying to get summary
on each column of df1
using lapply
assuming that the column classes
are factors. From the post, I guess that is the case. Using do.call(data.frame
on the list
output, converts it to data.frame
.
do.call(data.frame,lapply(df1, summary)) #in case a matrix output is needed, just replace `data.frame` with `cbind`
# header1 header2
#candy 1 3
#nocandy 5 3
summary(df1)
# header1 header2
#candy :1 candy :3
#nocandy:5 nocandy:3
If you need only selected columns from many columns in a dataset,
nm1 <- paste0("header",1:2) #names of columns to do the summary
do.call(`cbind`, lapply(df1[nm1], summary))
# header1 header2
#candy 1 3
#nocandy 5 3
You could also do summary
with data.table
library(data.table)
DT <- setDT(df1)[, lapply(.SD, summary)] #or
#DT <- setDT(df1)[, lapply(.SD, table)]
DT
# header1 header2
#1: 1 3
#2: 5 3
chisq.test(DT)
# Pearson's Chi-squared test with Yates' continuity correction
#data: DT
#X-squared = 0.375, df = 1, p-value = 0.5403
#Warning message:
#In chisq.test(DT) : Chi-squared approximation may be incorrect
df1 <- structure(list(header1 = structure(c(2L, 2L, 2L, 2L, 2L, 1L), .Label = c("candy",
"nocandy"), class = "factor"), header2 = structure(c(2L, 2L,
2L, 1L, 1L, 1L), .Label = c("candy", "nocandy"), class = "factor")), .Names = c("header1",
"header2"), row.names = c(NA, -6L), class = "data.frame")
Upvotes: 1
Reputation: 24623
Try:
> dd = data.frame(sapply(candytext, summary))
> dd
header1 header2
candy 1 3
nocandy 5 3
> chisq.test(dd)
Pearson's Chi-squared test with Yates' continuity correction
data: dd
X-squared = 0.375, df = 1, p-value = 0.5403
Warning message:
In chisq.test(dd) : Chi-squared approximation may be incorrect
>
If you want to select 2 columns from a multicolumn data frame:
> cc = cbind(summary(candytext$header1), summary(candytext$header2))
> cc
[,1] [,2]
candy 1 3
nocandy 5 3
> chisq.test(cc)
Pearson's Chi-squared test with Yates' continuity correction
data: cc
X-squared = 0.375, df = 1, p-value = 0.5403
Warning message:
In chisq.test(cc) : Chi-squared approximation may be incorrect
In following form, table and summary are same:
> cbind(table(candytext$header1), table(candytext$header2))
[,1] [,2]
candy 1 3
nocandy 5 3
>
> cbind(summary(candytext$header1), summary(candytext$header2))
[,1] [,2]
candy 1 3
nocandy 5 3
Upvotes: 1
Reputation: 206576
It sounds like you want to treat your columns as independent samples. If so, this might not be the best data structure. But you could do
#sample data
candytext<-read.table(text='header1 header2
"nocandy" "nocandy"
"nocandy" "nocandy"
"nocandy" "nocandy"
"nocandy" "candy"
"nocandy" "candy"
"candy" "candy"', header=T)
#summarize
do.call(cbind, lapply(candytext, table))
# header1 header2
# candy 1 3
# nocandy 5 3
Upvotes: 1