jelijelidjango
jelijelidjango

Reputation: 451

R Chi-Squared Table Format

So I have some data that is formatted like so:

header1    header2
"nocandy"  "nocandy"
"nocandy"  "nocandy"
"nocandy"  "nocandy"
"nocandy"    "candy"
"nocandy"    "candy"
"candy"    "candy"
etc...

I imported it with candytext <- read.table("candytest.txt", header=TRUE) And I want to do a chi-squared test to see if there is a difference between the two groups. When I use the function table(candytest) I get something like this:

         header2
header1   candy nocandy
  candy     112      39
  nocandy     4      82

But if I run summary(candytest) I get something like this:

    header1       header2   
 candy  :151   candy  :116  
 nocandy: 86   nocandy:121 

As you can see the two tables are formatted differently. However, I can run a chisquared test on the first table but not the second. However the summary table is more like the table I would need to use to do a chisq.test() on. The second table looks like it's assuming that the data is paired, but the data is not paired. If it was paired it would be fine and I could use McNemars test on the output of table(candytest), but it's not paired. So how do I create a 2 by 2 matrix that looks like the summary table, without typing it out by hand. I realise I could copy the summary table into a matrix, however I want to know how to convert it in R with functions properly.

Thank you!

Upvotes: 2

Views: 2404

Answers (3)

akrun
akrun

Reputation: 887911

Here, I am trying to get summary on each column of df1 using lapply assuming that the column classes are factors. From the post, I guess that is the case. Using do.call(data.frame on the list output, converts it to data.frame.

  do.call(data.frame,lapply(df1, summary)) #in case a matrix output is needed, just replace `data.frame` with `cbind`
  #          header1 header2
  #candy         1       3
  #nocandy       5       3


  summary(df1)
  #   header1     header2 
  #candy  :1   candy  :3  
  #nocandy:5   nocandy:3  

If you need only selected columns from many columns in a dataset,

  nm1 <- paste0("header",1:2) #names of columns to do the summary
   do.call(`cbind`, lapply(df1[nm1], summary))
   #        header1 header2
   #candy         1       3
   #nocandy       5       3

You could also do summary with data.table

  library(data.table)
  DT <- setDT(df1)[, lapply(.SD, summary)]   #or

 #DT <- setDT(df1)[, lapply(.SD, table)] 
  DT
   #    header1 header2
   #1:       1       3
   #2:       5       3

 chisq.test(DT)

 #    Pearson's Chi-squared test with Yates' continuity correction

  #data:  DT
  #X-squared = 0.375, df = 1, p-value = 0.5403

  #Warning message:
  #In chisq.test(DT) : Chi-squared approximation may be incorrect

data

df1 <- structure(list(header1 = structure(c(2L, 2L, 2L, 2L, 2L, 1L), .Label = c("candy", 
"nocandy"), class = "factor"), header2 = structure(c(2L, 2L, 
2L, 1L, 1L, 1L), .Label = c("candy", "nocandy"), class = "factor")), .Names = c("header1", 
"header2"), row.names = c(NA, -6L), class = "data.frame")

Upvotes: 1

rnso
rnso

Reputation: 24623

Try:

> dd = data.frame(sapply(candytext, summary))
> dd
        header1 header2
candy         1       3
nocandy       5       3

> chisq.test(dd)                
        Pearson's Chi-squared test with Yates' continuity correction                                                    

data:  dd                                                                                                               
X-squared = 0.375, df = 1, p-value = 0.5403                                                                             

Warning message:                                                                                                        
In chisq.test(dd) : Chi-squared approximation may be incorrect                                                          
>                                                                               

If you want to select 2 columns from a multicolumn data frame:

> cc = cbind(summary(candytext$header1), summary(candytext$header2))

> cc
        [,1] [,2]
candy      1    3
nocandy    5    3

> chisq.test(cc)

        Pearson's Chi-squared test with Yates' continuity correction

data:  cc
X-squared = 0.375, df = 1, p-value = 0.5403

Warning message:
In chisq.test(cc) : Chi-squared approximation may be incorrect

In following form, table and summary are same:

> cbind(table(candytext$header1), table(candytext$header2))
        [,1] [,2]
candy      1    3
nocandy    5    3
> 
> cbind(summary(candytext$header1), summary(candytext$header2))
        [,1] [,2]
candy      1    3
nocandy    5    3

Upvotes: 1

MrFlick
MrFlick

Reputation: 206576

It sounds like you want to treat your columns as independent samples. If so, this might not be the best data structure. But you could do

#sample data
candytext<-read.table(text='header1    header2
 "nocandy"  "nocandy"
 "nocandy"  "nocandy"
 "nocandy"  "nocandy"
 "nocandy"    "candy"
 "nocandy"    "candy"
 "candy"    "candy"', header=T)

#summarize
do.call(cbind, lapply(candytext, table))
#         header1 header2
# candy         1       3
# nocandy       5       3

Upvotes: 1

Related Questions