Set up loop to run several crosstabs simultaneously in R

Question

I am trying to find the weighted cross tabulations of several columns. My actual data has over 500 columns so I am trying to automate the process as much as possible.

This is a snippet of my data:

data_in <- read_table2("Q50_1   Q50_2   Q38 Q90 pov gender  wgt
1   3   Yes 2   High    M   1.3
2   4   No  2   Med F   0.4
4   2   Yes 4   Low F   1.2
3   3   No  2   High    M   0.5
1   2   No  4   High    M   0.7
2   2   Yes 3   Low F   0.56
4   4   Yes 2   Med F   0.9
")

What I have tried:

I could use the library(pollster) package to find the weighted frequencies one at a time like this. I'm using pivot longer here because I need each of these to look similar so that I rbind them into a single dataset.

crosstab(df = data_in, pov, Q50_1, wgt,format = "long") %>% pivot_longer(cols = starts_with("Q"))
crosstab(df = data_in, pov, Q38, wgt, format = "long") %>% pivot_longer(cols = starts_with("Q"))

crosstab(df = data_in, gender, Q50_1, wgt,format = "long")%>% pivot_longer(cols = starts_with("Q"))
crosstab(df = data_in, gender, Q50_2, wgt,format = "long")%>% pivot_longer(cols = starts_with("Q"))

This is my desired output:

data_out <- read_table2("group  pct n   question    response    group_level
High    80  2.5 Q50_1   1   pov
High    20  2.5 Q50_1   3   pov
Low 31.8    1.76    Q50_1   2   pov
Low 68.2    1.76    Q50_1   4   pov
Med 30.8    1.3 Q50_1   2   pov
Med 69.2    1.3 Q50_1   4   pov
High    48  2.5 Q38 No  pov
High    52  2.5 Q38 Yes pov
Low 100 1.76    Q38 Yes pov
Med 30.8    1.3 Q38 No  pov
Med 69.2    1.3 Q38 Yes pov
F   31.4    3.06    Q50_1   2   gender
F   68.6    3.06    Q50_1   4   gender
M   80  2.5 Q50_1   1   gender
M   20  2.5 Q50_1   3   gender
")

More of what I have tried:

I had the idea of creating some kind of function, that loops through two vectors. The first vector contains each of the X values that will go into cross tabs and the second will contain each of the Y values that will feed into cross tabs. This is what I have so far. I have been tweaking this for some time now to no avail.

  vect <- c("gender", "pov")
vect2 <- c("Q50_1","Q38")

func2 <- function(.data, vector, vector2) {
  .data <- data_in
  wgt <- .data$wgt
  
  for (i in 1:length(vector)) 
    for (j in 1:length(vector2)) {
    out <- crosstab(df = .data, vector[i], vector2[j], wgt,format = "long")
  }
  out
}


func2(vect, vect2)

Help with:

Is this a good approach to achieve my desired output? If so, why do I keep getting an error? :(
I would like to add the functionality to this so that I create single data set with all my crosstabs (as shown in the desired output).

Any suggestions appreciated! Thank you for your help!!

Matt · Accepted Answer

Here is one possible way to do this. I'm sure there are better ways, but not in my current pay grade:

library(tidyverse)
library(pollster)

# Create a function that runs your crosstabs and pivots data:

xtab_func <- function(data, col, target){
  col <- sym(col)
  target <- enquo(target)
  crosstab(df = data, !!target, !!col, wgt, format = "long") %>% pivot_longer(cols = starts_with("Q"))
}

# Select 'Q' columns for your loops
cols <- data_in %>% select(starts_with("Q")) %>% names(.)

Loop through your data.frame for gender variables

d <- list()
for (i in cols){
  x <- xtab_func(data_in, i, gender)
  x$i <- i
  x$group_level <- "gender"
  d[[i]] <- x
}

Create data.frame of gender data

df <- do.call(rbind, d) %>% 
  rename(group = gender)

Do the same as above for pov data

d <- list()
for (i in cols){
  x <- xtab_func(data_in, i, pov)
  x$i <- i
  x$group_level <- "pov"
  d[[i]] <- x
}

df2 <- do.call(rbind, d) %>% 
  rename(group = pov)

Clean up column names and drop "i" column

final <- rbind(df, df2) %>% 
  select(-i, question = name, response = value)

Final looks like this:

# A tibble: 37 x 6
   group   pct     n question value group_level
                 
 1 F      31.4  3.06 Q50_1    2     gender     
 2 F      68.6  3.06 Q50_1    4     gender     
 3 M      80    2.5  Q50_1    1     gender     
 4 M      20    2.5  Q50_1    3     gender     
 5 F      57.5  3.06 Q50_2    2     gender     
 6 F      42.5  3.06 Q50_2    4     gender     
 7 M      28.0  2.5  Q50_2    2     gender     
 8 M      72    2.5  Q50_2    3     gender     
 9 F      13.1  3.06 Q38      No    gender     
10 F      86.9  3.06 Q38      Yes   gender

Set up loop to run several crosstabs simultaneously in R

Answers (1)

Loop through your data.frame for gender variables

Create data.frame of gender data

Do the same as above for pov data

Clean up column names and drop "i" column

Related Questions