Reputation: 1040
I am trying to find the weighted cross tabulations of several columns. My actual data has over 500 columns so I am trying to automate the process as much as possible.
This is a snippet of my data:
data_in <- read_table2("Q50_1 Q50_2 Q38 Q90 pov gender wgt
1 3 Yes 2 High M 1.3
2 4 No 2 Med F 0.4
4 2 Yes 4 Low F 1.2
3 3 No 2 High M 0.5
1 2 No 4 High M 0.7
2 2 Yes 3 Low F 0.56
4 4 Yes 2 Med F 0.9
")
What I have tried:
I could use the library(pollster) package to find the weighted frequencies one at a time like this. I'm using pivot longer here because I need each of these to look similar so that I rbind them into a single dataset.
crosstab(df = data_in, pov, Q50_1, wgt,format = "long") %>% pivot_longer(cols = starts_with("Q"))
crosstab(df = data_in, pov, Q38, wgt, format = "long") %>% pivot_longer(cols = starts_with("Q"))
crosstab(df = data_in, gender, Q50_1, wgt,format = "long")%>% pivot_longer(cols = starts_with("Q"))
crosstab(df = data_in, gender, Q50_2, wgt,format = "long")%>% pivot_longer(cols = starts_with("Q"))
This is my desired output:
data_out <- read_table2("group pct n question response group_level
High 80 2.5 Q50_1 1 pov
High 20 2.5 Q50_1 3 pov
Low 31.8 1.76 Q50_1 2 pov
Low 68.2 1.76 Q50_1 4 pov
Med 30.8 1.3 Q50_1 2 pov
Med 69.2 1.3 Q50_1 4 pov
High 48 2.5 Q38 No pov
High 52 2.5 Q38 Yes pov
Low 100 1.76 Q38 Yes pov
Med 30.8 1.3 Q38 No pov
Med 69.2 1.3 Q38 Yes pov
F 31.4 3.06 Q50_1 2 gender
F 68.6 3.06 Q50_1 4 gender
M 80 2.5 Q50_1 1 gender
M 20 2.5 Q50_1 3 gender
")
More of what I have tried:
I had the idea of creating some kind of function, that loops through two vectors. The first vector contains each of the X values that will go into cross tabs and the second will contain each of the Y values that will feed into cross tabs. This is what I have so far. I have been tweaking this for some time now to no avail.
vect <- c("gender", "pov")
vect2 <- c("Q50_1","Q38")
func2 <- function(.data, vector, vector2) {
.data <- data_in
wgt <- .data$wgt
for (i in 1:length(vector))
for (j in 1:length(vector2)) {
out <- crosstab(df = .data, vector[i], vector2[j], wgt,format = "long")
}
out
}
func2(vect, vect2)
Help with:
Any suggestions appreciated! Thank you for your help!!
Upvotes: 3
Views: 713
Reputation: 7385
Here is one possible way to do this. I'm sure there are better ways, but not in my current pay grade:
library(tidyverse)
library(pollster)
# Create a function that runs your crosstabs and pivots data:
xtab_func <- function(data, col, target){
col <- sym(col)
target <- enquo(target)
crosstab(df = data, !!target, !!col, wgt, format = "long") %>% pivot_longer(cols = starts_with("Q"))
}
# Select 'Q' columns for your loops
cols <- data_in %>% select(starts_with("Q")) %>% names(.)
d <- list()
for (i in cols){
x <- xtab_func(data_in, i, gender)
x$i <- i
x$group_level <- "gender"
d[[i]] <- x
}
df <- do.call(rbind, d) %>%
rename(group = gender)
d <- list()
for (i in cols){
x <- xtab_func(data_in, i, pov)
x$i <- i
x$group_level <- "pov"
d[[i]] <- x
}
df2 <- do.call(rbind, d) %>%
rename(group = pov)
final <- rbind(df, df2) %>%
select(-i, question = name, response = value)
Final looks like this:
# A tibble: 37 x 6
group pct n question value group_level
<fct> <dbl> <dbl> <chr> <fct> <chr>
1 F 31.4 3.06 Q50_1 2 gender
2 F 68.6 3.06 Q50_1 4 gender
3 M 80 2.5 Q50_1 1 gender
4 M 20 2.5 Q50_1 3 gender
5 F 57.5 3.06 Q50_2 2 gender
6 F 42.5 3.06 Q50_2 4 gender
7 M 28.0 2.5 Q50_2 2 gender
8 M 72 2.5 Q50_2 3 gender
9 F 13.1 3.06 Q38 No gender
10 F 86.9 3.06 Q38 Yes gender
Upvotes: 4