Sascha
Sascha

Reputation: 159

Can I mutate (divide) columns with different lenghts in R?

I was trying to calculate the relative frequency for the amount of people who have chosen a specific answer in a group.

I therefore created a small dataframe including the groups (1:4) , and the answers to the question (1 or 2) with 10 individuals in total.

Group <-        c(1,1,1,1,2,2,2,3,3,4)
Question <-       c(1,2,1,2,2,1,2,1,2,1)
DF<- data.frame(Group, Question)
DF
   Group   Question
1      1        1
2      1        2
3      1        1
4      1        2
5      2        2
6      2        1
7      2        2
8      3        1
9      3        2
10     4        1    

I then proceeded with counting the individals per group:

ind_p_gr <- DF %>%
group_by(Group) %>%
summarise(Count = n()) 

ind_p_gr

   # A tibble: 4 x 2
   Group Count
<dbl> <int>
1     1     4
2     2     3
3     3     2
4     4     1

.. and only select the "Count" column:

(count_select <- select(ind_p_gr, Count))
  
  Count
  <int>
1     4
2     3
3     2
4     1

Next I tried to calculate the relative frequencies only for answers with the value 2 / per group. My thought was to filter() for Question == 2 and to count them first.

(count(DF, Group, Question) %>%
 filter(Question == 2))
  
  Group Question n
1     1        2 2
2     2        2 2
3     3        2 1

which already shows, that Group 4 doesn't have any individals who have answered the question with "2". The new datafram is obviously shorter and only contains 3 rows now (instead of 4).

I then selected for "Group" and "n" only and mutated a new column called "rel_freq" by dividing n / count_selected. So together with the filter() and the count() function the code for the relative frequencies looks like:

rel <- count(DF, Group, Question) %>%
filter(Question == 2) %>%
select(Group, n) %>%
mutate(rel_freq = n / count_select) %>%
select(Group, n, rel_freq)

whichs yields to an error, because of the size of the variables not being the same (3 vs. 4)

Fehler: Problem with `mutate()` column `rel_freq`.
i `rel_freq = n/count_select`.
i `rel_freq` must be size 3 or 1, not 4.

The thing is, once I assign a value 2 answer for Group 4, everything works fine (because now the amount of rows of all dataframes are the same 4/4) and I get a table with the relative frequencies ranging from 0 to 1.

  Group n     Count
1     1 2 0.5000000
2     2 2 0.6666667
3     3 1 0.5000000
4     4 1 1.0000000

Is there a way to work around this problem? Thanks in advance!

Upvotes: 0

Views: 910

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 388982

For each Group you can calculate the number of individuals with n() whereas to count proportion of question 2 you can use mean(Question == 2). This will give you 0 when no person answer with 2 in a group.

library(dplyr)

DF %>%
  group_by(Group) %>%
  summarise(count = n(), 
            Question_2 = mean(Question == 2))

#  Group count Question_2
#  <dbl> <int>      <dbl>
#1     1     4      0.5  
#2     2     3      0.667
#3     3     2      0.5  
#4     4     1      0    

Upvotes: 0

akrun
akrun

Reputation: 887173

We may use a join here

library(dplyr)
library(tidyr)
count(DF, Group, Question) %>%
  filter(Question == 2) %>%
  select(Group, n) %>%
   right_join(ind_p_gr) %>%   
    mutate(rel_freq = n/Count) %>%
    mutate(across(c(n, rel_freq), replace_na, 1)) %>%
    select(Group, n, Count = rel_freq)

-output

Group n     Count
1     1 2 0.5000000
2     2 2 0.6666667
3     3 1 0.5000000
4     4 1 1.0000000

Upvotes: 1

Vin&#237;cius F&#233;lix
Vin&#237;cius F&#233;lix

Reputation: 8811

If I understood correctly, I think this will help

Libraries

library(dplyr)

Data

group <-        c(1,1,1,1,2,2,2,3,3,4)
answer <-       c(1,2,1,2,2,1,2,1,2,1)
df <- data.frame(group, answer)

Code

df %>% 
  count(group,answer) %>% 
  group_by(group) %>% 
  mutate(
    N = sum(n),
    prop = n/N,
    perc = 100*prop
  )

Output

  group answer     n     N  prop  perc
  <dbl>  <dbl> <int> <int> <dbl> <dbl>
1     1      1     2     4 0.5    50  
2     1      2     2     4 0.5    50  
3     2      1     1     3 0.333  33.3
4     2      2     2     3 0.667  66.7
5     3      1     1     2 0.5    50  
6     3      2     1     2 0.5    50  
7     4      1     1     1 1     100

Upvotes: 0

Related Questions