A.Foetisch
A.Foetisch

Reputation: 23

Mode computation on counted categorical variables

Here is my dataset:

X Totally.Disagree Disagree Agree Totally.agree
0                2        9   111           122
1                2       30   124            88
2                4       31   119            90
3               10       43   138            53
4               33       54    85            72
5               43       79    89            33
6               48       83    94            19
7               51       98    80            15
8               50      102    75            17
9               51       96    80            17

Where X (thus each line) is a question and the values are the number of people who choose this answer to the question. I would like to compute the mode (the most chosen answer) for each question.

This is what I tried:

df <- gather(df,Answer, count, Totally.Disagree:Totally.agree )
df %>% 
  group_by(X, Answer) %>%
  summarise(sum = count)%>%
  summarise(mode = df$Answer[which(df$count== max(df$count))])

But it doesn't work because max(df$count) refers to the whole dataset and not just one question.

I do not now if the way I tried is correct. If one of you could help me resolve this, I would be very grateful.

Upvotes: 2

Views: 90

Answers (2)

tmfmnk
tmfmnk

Reputation: 40171

A different approach could be:

df %>%
 mutate(mode = max.col(.[2:length(.)])+1) %>%
 rowwise() %>%
 mutate(mode = names(.)[[mode]]) %>%
 select(X, mode)

       X mode         
   <int> <chr>        
 1     0 Totally.agree
 2     1 Agree        
 3     2 Agree        
 4     3 Agree        
 5     4 Agree        
 6     5 Agree        
 7     6 Agree        
 8     7 Disagree     
 9     8 Disagree     
10     9 Disagree  

Here it, first, identifies the index of the column with the greatest count and then assigns the name of the column based on the column index.

And if you want to include also the numbers, you can try:

df %>%
 mutate(mode = max.col(.[2:length(.)])+1) %>%
 rowwise() %>%
 mutate(mode_names =  names(.)[[mode]], 
        mode_numbers = max(!!! rlang::syms(names(.)[2:length(.)]))) %>%
 select(X, mode_names, mode_numbers)

       X mode_names    mode_numbers
   <int> <chr>                <dbl>
 1     0 Totally.agree         122.
 2     1 Agree                 124.
 3     2 Agree                 119.
 4     3 Agree                 138.
 5     4 Agree                  85.
 6     5 Agree                  89.
 7     6 Agree                  94.
 8     7 Disagree               98.
 9     8 Disagree              102.
10     9 Disagree               96.

Or following your original logic:

df %>%
 gather(mode_names, mode_numbers, -X) %>%
 group_by(X) %>%
 filter(mode_numbers == max(mode_numbers)) %>%
 arrange(X)

       X mode_names    mode_numbers
   <int> <chr>                <int>
 1     0 Totally.agree          122
 2     1 Agree                  124
 3     2 Agree                  119
 4     3 Agree                  138
 5     4 Agree                   85
 6     5 Agree                   89
 7     6 Agree                   94
 8     7 Disagree                98
 9     8 Disagree               102
10     9 Disagree                96

Upvotes: 1

Julius Vainora
Julius Vainora

Reputation: 48241

If you want only the answer itself (without numbers) and we can assume that there are no ties, then

df <- gather(df, Answer, count, Totally.Disagree:Totally.agree)
df %>% group_by(X) %>% summarise(mode = Answer[which.max(count)])
# A tibble: 10 x 2
#        X mode         
#    <int> <chr>        
#  1     0 Totally.agree
#  2     1 Agree        
#  3     2 Agree        
#  4     3 Agree        
#  5     4 Agree        
#  6     5 Agree        
#  7     6 Agree        
#  8     7 Disagree     
#  9     8 Disagree     
# 10     9 Disagree

where Answer[which.max(count)] is basically what you intended to do, but there is no need df$ since you want those calculations to be done group-wise.

Upvotes: 0

Related Questions