Reputation: 353
I have a dataset with 220 variables (columns). My data look like this
test var1 var2 var3
1 Pretest No Sometimes No
2 Postest No Sometimes No
3 Pretest No Yes No
4 Postest No Yes No
5 Pretest No Sometimes No
6 Postest No Yes No
7 Pretest No Yes No
8 Postest No No No
9 Pretest No Yes No
10 Postest No Sometimes No
11 Pretest No Sometimes No
12 Postest No No No
13 Pretest Yes Yes No
14 Postest No Sometimes No
15 Pretest No No No
16 Postest No Sometimes No
17 Pretest No Sometimes No
18 Postest No No No
19 Pretest No Yes No
20 Postest No Yes No
For each variable I want to produce a relative frequency table by group. To that end, I'm using dplyr
propvar1 <- dataprepost %>%
group_by(test,var1) %>%
summarise(n = n()) %>%
mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%"))
For instance, this gives me the table I want for Var1:
test var1 n rel.freq
1 Postest Sometimes 120 14%
2 Postest No 667 79%
3 Postest Yes 44 5%
4 Postest NA 10 1%
5 Pretest Sometimes 155 18%
6 Pretest No 623 74%
7 Pretest Yes 49 6%
8 Pretest NA 14 2%
How can I create a function to do this automatically for each variable in the data set, so I end up with a similar table for each variable?
Upvotes: 0
Views: 371
Reputation: 1015
Not the most elegant piece of code... But it works!
d <- structure(list(test = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L), .Label = c("Postest",
"Pretest"), class = "factor"), var1 = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), .Label = c("No", "Yes"), class = "factor"), var2 = structure(c(2L,
2L, 3L, 3L, 2L, 3L, 3L, 1L, 3L, 2L, 2L, 1L, 3L, 2L, 1L, 2L, 2L,
1L, 3L, 3L), .Label = c("No", "Sometimes", "Yes"), class = "factor"),
var3 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "No", class = "factor")), .Names = c("test",
"var1", "var2", "var3"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20"))`
output <- NULL
for (t_cat in unique(d$test)) {
x <- d[d$test == t_cat, ]
x$test <- NULL
l <- lapply(x, function(x) {
output <- data.frame(table(x))
colnames(output) <- c('val', 'n')
output$rel.freq <- output$n/length(x)
output$category <- t_cat
return(output)
})
r <- do.call("rbind", lapply(names(l), function(x) { cbind(l[[x]], 'var' = x)}))
output <- rbind(output, r)
}
print(output)
# val n rel.freq category var
# 1 No 9 0.9 Pretest var1
# 2 Yes 1 0.1 Pretest var1
# 3 No 1 0.1 Pretest var2
# 4 Sometimes 4 0.4 Pretest var2
# 5 Yes 5 0.5 Pretest var2
# 6 No 10 1.0 Pretest var3
# 7 No 10 1.0 Postest var1
# 8 Yes 0 0.0 Postest var1
# 9 No 3 0.3 Postest var2
# 10 Sometimes 4 0.4 Postest var2
# 11 Yes 3 0.3 Postest var2
# 12 No 10 1.0 Postest var3
Upvotes: 2