tomclark
tomclark

Reputation: 3

Creating a table from dataset to run Chisq test

I want to analysis categorical data with a chisq test in R. I am working with transplant data, I am looking to compare outcomes between on/off bypass at surgery. I have asked a similar question before regarding my categorical variables and was given this answer to test for group difference by sex:

df <- read.table(text="Group, Age, Sex, Height, Weight, Diagnosis, Blood loss, Intubation time, Survival
                 On bypass,59,Male,165,102,Diagnosis 1,57,53,29
                 On bypass,44,Female,164,140,Diagnosis 1,114,15,35
                 On bypass,45,Male,165,119,Diagnosis 2,118,31,81
                 On bypass,26,Male,178,125,Diagnosis 1,171,36,31
                 On bypass,41,Female,177,105,Diagnosis 1,76,53,91
                 On bypass,43,Male,161,119,Diagnosis 3,97,38,63
                 Off bypass,53,Female,164,139,Diagnosis 1,125,49,51
                 Off bypass,26,Female,165,137,Diagnosis 3,29,7,86
                 Off bypass,30,Male,174,121,Diagnosis 1,174,43,100
                 Off bypass,59,Female,174,133,Diagnosis 1,40,16,43
                 Off bypass,63,Male,172,132,Diagnosis 2,32,46,10  ", header = TRUE, sep = ",")

library(dplyr)

# tally number of participants in each Group by Sex
tab <- tally(group_by(df, Group, Sex))
chisq.test(tab$n)  # test for Group differences by Sex

I have used this to test for differences between categories with two variables (such as sex, the two variables being male and female), however some of my categories have multiple variables, for example diagnosis (see my example data set below). For these categories I want to compare the difference between each diagnosis in on/off bypass groups.

Here is my exampledata:

exampledata <- read.table(text="ID,Bypass,Sex,Age,Height,Weight,Diagnosis
                 559,Bypass on,Male,33,167,78,Other
                 662,Bypass off,Male,63,175,55,UIP
                 956,Bypass off,Female,40,158,88,Other
                 460,Bypass on,Female,34,173,86,UIP
                 153,Bypass off,Female,31,171,74,UIP
                 192,Bypass off,Male,33,163,64,Other
                 658,Bypass on,Male,50,161,60,Other
                 529,Bypass off,Female,55,179,75,Cystic fibrosis
                 981,Bypass on,Male,36,166,81,Other
                 367,Bypass on,Female,46,152,85,PH
                 728,Bypass off,Male,30,169,88,Other
                 185,Bypass on,Female,65,162,57,UIP
                 160,Bypass on,Male,54,176,62,PH
                 175,Bypass off,Male,29,156,78,Other
                 167,Bypass off,Male,20,175,86,PH
                 149,Bypass on,Male,24,169,82,Cystic fibrosis
                 446,Bypass off,Male,38,162,69,PH
                 667,Bypass on,Male,55,150,55,Cystic fibrosis
                 488,Bypass off,Female,41,162,56,Other
                 169,Bypass off,Female,60,154,55,Cystic fibrosis
                 787,Bypass on,Male,41,169,52,Cystic fibrosis
                 443,Bypass on,Male,35,159,77,Other
                 593,Bypass off,Female,28,167,53,Other
                 653,Bypass off,Female,22,176,75,Other
                 685,Bypass off,Male,26,170,88,Cystic fibrosis
                 676,Bypass on,Male,32,172,58,Cystic fibrosis
                 556,Bypass off,Male,26,168,88,PH
                 943,Bypass off,Male,40,176,80,PH
                 940,Bypass off,Male,37,180,69,Cystic fibrosis
                 740,Bypass on,Female,58,153,72,UIP
                 624,Bypass on,Female,40,156,81,UIP
                 194,Bypass on,Male,33,155,60,PH
                 162,Bypass on,Female,23,170,64,PH
                 283,Bypass off,Male,60,180,61,Other
                 404,Bypass on,Male,26,170,63,PH
                 312,Bypass on,Male,36,171,83,PH
                 995,Bypass on,Female,48,161,67,Other
                 254,Bypass on,Female,35,175,62,UIP
                 364,Bypass on,Female,65,161,55,UIP
                 771,Bypass off,Male,37,157,72,Other
                 698,Bypass on,Male,31,163,87,PH
                 286,Bypass on,Female,60,154,80,UIP
                 189,Bypass off,Male,42,168,57,PH
                 463,Bypass on,Female,32,176,50,PH
                 634,Bypass off,Male,53,152,64,UIP
                 198,Bypass off,Female,20,171,70,Cystic fibrosis
                 356,Bypass off,Male,55,161,72,Cystic fibrosis
                 254,Bypass on,Female,49,169,61,UIP
                 921,Bypass on,Male,47,152,63,UIP
                 185,Bypass on,Male,63,174,71,Other
                 953,Bypass on,Male,32,169,63,PH
                 336,Bypass on,Female,33,164,52,Other
                 651,Bypass off,Female,55,172,54,PH
                 200,Bypass off,Male,43,179,55,UIP
                 625,Bypass off,Male,43,158,75,Other
                 986,Bypass on,Female,32,151,81,Other
                 437,Bypass off,Female,53,152,57,Other
                 433,Bypass on,Male,35,180,74,Cystic fibrosis
                 673,Bypass on,Female,27,159,58,Cystic fibrosis
                 901,Bypass off,Male,30,169,72,PH", header = TRUE, sep = ",")

I am using this to create a table of counts:

mytable <- table(exampledata$Bypass,exampledata$Diagnosis)

returns

             Cystic fibrosis Other PH UIP
  Bypass off               6    11  7   4
  Bypass on                6     8  9   9

However, as I wish to look at each diagnosis individually the output I require is

             Cystic fibrosis Not Cystic fibrosis
  Bypass off               6    22
  Bypass on                6    26

I am hoping that using this output I can compare the number of patients that have Cystic fibrosis in the on/off pump groups.

Ideally I would then be able to quickly repeat this for each diagnosis.

If someone believes there is a better way of doing this (or I am just doing it the wrong way) then please advise.

Any help would be much appreciated.

Thanks, Tom

Upvotes: 0

Views: 169

Answers (1)

Gopala
Gopala

Reputation: 10473

You can do something like this:

mytable <- table(exampledata$Bypass, exampledata$Diagnosis == 'Cystic fibrosis')
colnames(mytable) <- c('Not Cystic fibrosis', 'Cystic fibrosis')

             Not Cystic fibrosis Cystic fibrosis
  Bypass off                  22               6
  Bypass on                   26               6

If you want this same thing done for all categories, you can do this in a function / loop.

EDIT: adding a loop option to get all the tables needed:

lapply(levels(exampledata$Diagnosis), function(x) {
         mytable <- table(exampledata$Bypass, exampledata$Diagnosis == x)
         colnames(mytable) <- c(paste('Not ', x, sep = ''), x)
         mytable
       })

Output is as follows:

[[1]]

             Not Cystic fibrosis Cystic fibrosis
  Bypass off                  22               6
  Bypass on                   26               6

[[2]]

             Not Other Other
  Bypass off        17    11
  Bypass on         24     8

[[3]]

             Not PH PH
  Bypass off     21  7
  Bypass on      23  9

[[4]]

             Not UIP UIP
  Bypass off      24   4
  Bypass on       23   9

To run all chi-square tests on each of the above tables, simply save the output of that above lapply call to some variable - let us call l.

Then use:

sapply(l, chisq.test)

Output should be a list of four summaries from the test(s).

Of course, once you save the lapply output to a list l, you can also run individual chi-square tests like:

chisq.test(l[[1]])

Upvotes: 1

Related Questions