Lidwien
Lidwien

Reputation: 11

How do I create a frequency table with empty categories in R?

I have a table with responses to multiple items in a survey. (e.g. 1 = disagree and 7 agree)

var1 <- c(2, 2, 4, 1, 5, 3, 4, 6, 7, 7, 6)
var2 <- c(3, 4, 5, 1, 1, 2, 6, 6, 7, 1, 2)
var3 <- c(1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 1)

df <- cbind(var1, var2, var3)

To prepare for a plot, I would like to obtain a frequency table through:

frequenties <- df %>%
   apply(2, table) %>%
   as.data.frame() %>%
   rownames_to_column() %>%
   rename(antwoord = rowname)

That works. However, if for some variables not all answer possibilities are present I run into trouble.

In the example below, value 7 does not appear.

var3 <- c(1, 2, 3, 1, 2, 3, 4, 5, 6, 6, 1)

df <- cbind(var1, var2, var3)

If I run the same code:

frequenties <- df %>%
    apply(2, table) %>%
    as.data.frame() %>%
    rownames_to_column() %>%
    rename(antwoord = rowname)

The error is: Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 7, 6

I get the problem: the lists are different in length. apply does not pick up the zero's and a a consequence the list of var3 is shorter.

I do not know how to solve this problem. Is there a way to deal with empty categories? Is there another way to make a frequency table. How?

Upvotes: 1

Views: 649

Answers (2)

akrun
akrun

Reputation: 887691

Another option is to pivot to long format with pivot_longer, use count and reshape back to 'wide' with pivot_wider

library(dplyr)
library(tidyr)
df %>%
    pivot_longer(cols = everything(), values_to = 'antwoord') %>%
    count(name, antwoord) %>% 
    pivot_wider(names_from = name, values_from = n)

data

df <- data.frame(var1, var2, var3)

Upvotes: 1

Ian Campbell
Ian Campbell

Reputation: 24848

One approach is to convert the variables to factors that include all 7 levels. Then the output of table will include all 7 possibilities:

library(dplyr)
library(purrr)
as.data.frame(df) %>%
  mutate(across(starts_with("var"), ~factor(.,levels = 1:7))) %>%
  map_dfc(table) %>%
  rownames_to_column(var = "antwoord")
# A tibble: 7 x 4
  antwoord var1    var2    var3   
  <chr>    <table> <table> <table>
1 1        1       3       3      
2 2        2       2       2      
3 3        1       1       2      
4 4        2       1       1      
5 5        1       1       1      
6 6        2       2       2      
7 7        2       1       0     

An alternative approach would be to pivot the data using tidyr::pivot_longer and then use dplyr::tally:

library(tidyr)
as.data.frame(df) %>%
  pivot_longer(cols = everything(), values_to = "antwoord") %>%
  group_by(name,antwoord) %>%
  tally %>%
  pivot_wider(names_from = "name", values_from = n, values_fill = 0)
# A tibble: 7 x 4
  antwoord  var1  var2  var3
     <dbl> <int> <int> <int>
1        1     1     3     3
2        2     2     2     2
3        3     1     1     2
4        4     2     1     1
5        5     1     1     1
6        6     2     2     2
7        7     2     1     0

Upvotes: 1

Related Questions