Reputation: 11
I have a table with responses to multiple items in a survey. (e.g. 1 = disagree and 7 agree)
var1 <- c(2, 2, 4, 1, 5, 3, 4, 6, 7, 7, 6)
var2 <- c(3, 4, 5, 1, 1, 2, 6, 6, 7, 1, 2)
var3 <- c(1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 1)
df <- cbind(var1, var2, var3)
To prepare for a plot, I would like to obtain a frequency table through:
frequenties <- df %>%
apply(2, table) %>%
as.data.frame() %>%
rownames_to_column() %>%
rename(antwoord = rowname)
That works. However, if for some variables not all answer possibilities are present I run into trouble.
In the example below, value 7 does not appear.
var3 <- c(1, 2, 3, 1, 2, 3, 4, 5, 6, 6, 1)
df <- cbind(var1, var2, var3)
If I run the same code:
frequenties <- df %>%
apply(2, table) %>%
as.data.frame() %>%
rownames_to_column() %>%
rename(antwoord = rowname)
The error is: Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 7, 6
I get the problem: the lists are different in length. apply does not pick up the zero's and a a consequence the list of var3 is shorter.
I do not know how to solve this problem. Is there a way to deal with empty categories? Is there another way to make a frequency table. How?
Upvotes: 1
Views: 649
Reputation: 887691
Another option is to pivot to long format with pivot_longer
, use count
and reshape back to 'wide' with pivot_wider
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = everything(), values_to = 'antwoord') %>%
count(name, antwoord) %>%
pivot_wider(names_from = name, values_from = n)
df <- data.frame(var1, var2, var3)
Upvotes: 1
Reputation: 24848
One approach is to convert the variables to factors that include all 7 levels. Then the output of table will include all 7 possibilities:
library(dplyr)
library(purrr)
as.data.frame(df) %>%
mutate(across(starts_with("var"), ~factor(.,levels = 1:7))) %>%
map_dfc(table) %>%
rownames_to_column(var = "antwoord")
# A tibble: 7 x 4
antwoord var1 var2 var3
<chr> <table> <table> <table>
1 1 1 3 3
2 2 2 2 2
3 3 1 1 2
4 4 2 1 1
5 5 1 1 1
6 6 2 2 2
7 7 2 1 0
An alternative approach would be to pivot the data using tidyr::pivot_longer
and then use dplyr::tally
:
library(tidyr)
as.data.frame(df) %>%
pivot_longer(cols = everything(), values_to = "antwoord") %>%
group_by(name,antwoord) %>%
tally %>%
pivot_wider(names_from = "name", values_from = n, values_fill = 0)
# A tibble: 7 x 4
antwoord var1 var2 var3
<dbl> <int> <int> <int>
1 1 1 3 3
2 2 2 2 2
3 3 1 1 2
4 4 2 1 1
5 5 1 1 1
6 6 2 2 2
7 7 2 1 0
Upvotes: 1