Reputation: 159
I was trying to calculate the relative frequency for the amount of people who have chosen a specific answer in a group.
I therefore created a small dataframe including the groups (1:4) , and the answers to the question (1 or 2) with 10 individuals in total.
Group <- c(1,1,1,1,2,2,2,3,3,4)
Question <- c(1,2,1,2,2,1,2,1,2,1)
DF<- data.frame(Group, Question)
DF
Group Question
1 1 1
2 1 2
3 1 1
4 1 2
5 2 2
6 2 1
7 2 2
8 3 1
9 3 2
10 4 1
I then proceeded with counting the individals per group:
ind_p_gr <- DF %>%
group_by(Group) %>%
summarise(Count = n())
ind_p_gr
# A tibble: 4 x 2
Group Count
<dbl> <int>
1 1 4
2 2 3
3 3 2
4 4 1
.. and only select the "Count" column:
(count_select <- select(ind_p_gr, Count))
Count
<int>
1 4
2 3
3 2
4 1
Next I tried to calculate the relative frequencies only for answers with the value 2 / per group. My thought was to filter() for Question == 2 and to count them first.
(count(DF, Group, Question) %>%
filter(Question == 2))
Group Question n
1 1 2 2
2 2 2 2
3 3 2 1
which already shows, that Group 4 doesn't have any individals who have answered the question with "2". The new datafram is obviously shorter and only contains 3 rows now (instead of 4).
I then selected for "Group" and "n" only and mutated a new column called "rel_freq" by dividing n / count_selected. So together with the filter() and the count() function the code for the relative frequencies looks like:
rel <- count(DF, Group, Question) %>%
filter(Question == 2) %>%
select(Group, n) %>%
mutate(rel_freq = n / count_select) %>%
select(Group, n, rel_freq)
whichs yields to an error, because of the size of the variables not being the same (3 vs. 4)
Fehler: Problem with `mutate()` column `rel_freq`.
i `rel_freq = n/count_select`.
i `rel_freq` must be size 3 or 1, not 4.
The thing is, once I assign a value 2 answer for Group 4, everything works fine (because now the amount of rows of all dataframes are the same 4/4) and I get a table with the relative frequencies ranging from 0 to 1.
Group n Count
1 1 2 0.5000000
2 2 2 0.6666667
3 3 1 0.5000000
4 4 1 1.0000000
Is there a way to work around this problem? Thanks in advance!
Upvotes: 0
Views: 910
Reputation: 388982
For each Group
you can calculate the number of individuals with n()
whereas to count proportion of question 2 you can use mean(Question == 2)
. This will give you 0 when no person answer with 2 in a group.
library(dplyr)
DF %>%
group_by(Group) %>%
summarise(count = n(),
Question_2 = mean(Question == 2))
# Group count Question_2
# <dbl> <int> <dbl>
#1 1 4 0.5
#2 2 3 0.667
#3 3 2 0.5
#4 4 1 0
Upvotes: 0
Reputation: 887173
We may use a join here
library(dplyr)
library(tidyr)
count(DF, Group, Question) %>%
filter(Question == 2) %>%
select(Group, n) %>%
right_join(ind_p_gr) %>%
mutate(rel_freq = n/Count) %>%
mutate(across(c(n, rel_freq), replace_na, 1)) %>%
select(Group, n, Count = rel_freq)
-output
Group n Count
1 1 2 0.5000000
2 2 2 0.6666667
3 3 1 0.5000000
4 4 1 1.0000000
Upvotes: 1
Reputation: 8811
If I understood correctly, I think this will help
library(dplyr)
group <- c(1,1,1,1,2,2,2,3,3,4)
answer <- c(1,2,1,2,2,1,2,1,2,1)
df <- data.frame(group, answer)
df %>%
count(group,answer) %>%
group_by(group) %>%
mutate(
N = sum(n),
prop = n/N,
perc = 100*prop
)
group answer n N prop perc
<dbl> <dbl> <int> <int> <dbl> <dbl>
1 1 1 2 4 0.5 50
2 1 2 2 4 0.5 50
3 2 1 1 3 0.333 33.3
4 2 2 2 3 0.667 66.7
5 3 1 1 2 0.5 50
6 3 2 1 2 0.5 50
7 4 1 1 1 1 100
Upvotes: 0