Afana A
Afana A

Reputation: 33

How to build a contingency table from multiple categorical attributes in R?

I'm trying to build a code or find away in R to transform my categorical data, obtained from a questionnaire, to a contingency table. Here is my sample data

Age = sample(c("15--25", "26-35", "36-45", " "45"), 90, replace = TRUE)
Volunteering_yr = sample(c(‘1yr’, ‘2yr’, ‘3yr>’), 90, replace = TRUE)
Q1 = sample(c(‘A’,’B’,’C’,’D, ‘E’), 90, replace =TRUE)
Q2= sample(c(‘A’,’B’,’C’,’D, ‘E’), 90, replace =TRUE)
Q3 = sample(c(‘A’,’B’,’C’,’D, ‘E’), 90, replace =TRUE)
Q4 = sample(c(‘A’,’B’,’C’,’D, ‘E’), 90, replace =TRUE)
Q5 = sample(c(‘A’,’B’,’C’,’D, ‘E’), 90, replace =TRUE)
Db = data.frame(Age,Volunteering_yr , Q1, Q2, Q3, Q4, Q5) 

I would like to have the data reorganised by either volunteering yrs or Age but grouped by the count of answers (e.g. A, B, C, D, and E), something like this

enter image description here

Any suggestions? Many thanks

Upvotes: 0

Views: 165

Answers (2)

Ian Campbell
Ian Campbell

Reputation: 24878

You can use the xtabs function:

xtabs(~Volunteering_yr + Q1,Db)
               Q1
Volunteering_yr  A  B  C  D  E
           1yr   6  7  3  6  7
           2yr   6  4  1  5  7
           3yr>  7  6  5  8 12

I'm not aware of a simple base R function to do all quarters at once, but it would be easy enough with tidyverse:

library(tidyverse)
Db %>% 
  pivot_longer(-c(Age,Volunteering_yr)) %>% 
  group_by(Volunteering_yr, value) %>%
  tally() %>%
  pivot_wider(names_from = value, values_from = n)
## A tibble: 3 x 6
## Groups:   Volunteering_yr [3]
#  Volunteering_yr     A     B     C     D     E
#  <chr>           <int> <int> <int> <int> <int>
#1 1yr                33    33    25    26    28
#2 2yr                28    18    20    23    26
#3 3yr>               43    46    33    31    37

Upvotes: 3

pdw
pdw

Reputation: 363

Try this. I used the pacman package to load libraries, but you could do them separately.

Db <- data.frame(Age=sample(c("15-25","26-35","36-45"), 90, replace=T),
Volunteering_yr=sample(c("1yr","2yr","3yr"), 90, replace=T),
Q1 = sample(c("A","B","C","D","E"), 90, replace =TRUE),
Q2 = sample(c("A","B","C","D","E"), 90, replace =TRUE),
Q3 = sample(c("A","B","C","D","E"), 90, replace =TRUE),
Q4 = sample(c("A","B","C","D","E"), 90, replace =TRUE),
Q5 = sample(c("A","B","C","D","E"), 90, replace =TRUE))

pacman::p_load(dplyr, magrittr)
Db %<>% mutate(across(c("Age", "Volunteering_yr"), factor))

with(Db, table(Age, Volunteering_yr))
      Volunteering_yr
Age     1yr 2yr 3yr
  15-25   9  16   6
  26-35   9  13   8
  36-45  11   7  11

Upvotes: 1

Related Questions