Chris
Chris

Reputation: 353

Function for creating relative frequency tables for each variable

I have a dataset with 220 variables (columns). My data look like this

    test      var1   var2      var3
1   Pretest   No    Sometimes   No
2   Postest   No    Sometimes   No
3   Pretest   No    Yes         No
4   Postest   No    Yes         No
5   Pretest   No    Sometimes   No
6   Postest   No    Yes         No
7   Pretest   No    Yes         No
8   Postest   No    No          No
9   Pretest   No    Yes         No
10  Postest   No    Sometimes   No
11  Pretest   No    Sometimes   No
12  Postest   No    No  No
13  Pretest   Yes   Yes No
14  Postest   No    Sometimes   No
15  Pretest   No    No  No
16  Postest   No    Sometimes   No
17  Pretest   No    Sometimes   No
18  Postest   No    No  No
19  Pretest   No    Yes No
20  Postest   No    Yes No

For each variable I want to produce a relative frequency table by group. To that end, I'm using dplyr

propvar1 <- dataprepost %>% 
  group_by(test,var1) %>%
  summarise(n = n()) %>%
  mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%"))

For instance, this gives me the table I want for Var1:

    test      var1      n     rel.freq
1   Postest   Sometimes 120   14%
2   Postest   No        667   79%
3   Postest   Yes       44    5%
4   Postest   NA        10    1%
5   Pretest   Sometimes 155   18%
6   Pretest   No        623   74%
7   Pretest   Yes       49    6%
8   Pretest   NA        14    2%

How can I create a function to do this automatically for each variable in the data set, so I end up with a similar table for each variable?

Upvotes: 0

Views: 371

Answers (1)

12b345b6b78
12b345b6b78

Reputation: 1015

Not the most elegant piece of code... But it works!

Data

d <- structure(list(test = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L, 
                                       1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L), .Label = c("Postest", 
                                                                                                       "Pretest"), class = "factor"), var1 = structure(c(1L, 1L, 1L, 
                                                                                                                                                         1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                                                                                                                                         1L), .Label = c("No", "Yes"), class = "factor"), var2 = structure(c(2L, 
                                                                                                                                                                                                                             2L, 3L, 3L, 2L, 3L, 3L, 1L, 3L, 2L, 2L, 1L, 3L, 2L, 1L, 2L, 2L, 
                                                                                                                                                                                                                             1L, 3L, 3L), .Label = c("No", "Sometimes", "Yes"), class = "factor"), 
                    var3 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                       1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "No", class = "factor")), .Names = c("test", 
                                                                                                                              "var1", "var2", "var3"), class = "data.frame", row.names = c("1", 
                                                                                                                                                                                           "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
                                                                                                                                                                                           "14", "15", "16", "17", "18", "19", "20"))`

Solution

output <- NULL

for (t_cat in unique(d$test)) {
  x <- d[d$test == t_cat, ]
  x$test <- NULL

  l <- lapply(x, function(x) {
    output <- data.frame(table(x))
    colnames(output) <- c('val', 'n')
    output$rel.freq <- output$n/length(x)
    output$category <- t_cat
    return(output)
  })

  r <- do.call("rbind", lapply(names(l), function(x) { cbind(l[[x]], 'var' = x)}))

  output <- rbind(output, r)
}

print(output)

# val  n rel.freq category  var
# 1         No  9      0.9  Pretest var1
# 2        Yes  1      0.1  Pretest var1
# 3         No  1      0.1  Pretest var2
# 4  Sometimes  4      0.4  Pretest var2
# 5        Yes  5      0.5  Pretest var2
# 6         No 10      1.0  Pretest var3
# 7         No 10      1.0  Postest var1
# 8        Yes  0      0.0  Postest var1
# 9         No  3      0.3  Postest var2
# 10 Sometimes  4      0.4  Postest var2
# 11       Yes  3      0.3  Postest var2
# 12        No 10      1.0  Postest var3

Upvotes: 2

Related Questions