Reputation: 21
[Edit: I found a way to do it for the data in tidy/wide format (colSums(dfw[4:5,2:4], na.rm = T)/colSums(dfw[,2:4], na.rm = T)
) but does anyone know of a way to do it for data in long format?]
An alternative title (data in long format) could be "How to set sum of specific values of subcategory x into proportion to sum of all values of subcategory x, applied to all categories?".
In less abstract terms: I have multiple questions on a five point scale of which I want to calculate the top-2 proportions. The data is aggregated by answer option already (q1: 5 people chose answer option 1, 50 chose answer option 2 etc.). Therefore I have to sum up values for rows 4 and 5 and divide that sum by the sum of all five rows.
example data in wide format and the code to transform into long:
library(tidyverse)
dfw <- data.frame(answer_option = c(1,2,3,4,5), q1 = c(5,50,170,368,49), q2 = c(24,111,149,213,90), q3 = c(NA, 18, 45, 317, 266))
dfl <- dfw %>% pivot_longer(cols = 1:4)
The output should give me the sum of answer options 4 and 5 divided by the sum of all answer options for each question and could look like this:
q1 q2 q3
64.95 51.62 90.25
I could calculate everything manually (sum(dfw$q1[dfw$answer_option>3])/sum(dfw$q1)
) but that is not viable for large datasets like mine.
datasets:
answer_option q1 q2 q3
1 1 5 24 NA
2 2 50 111 18
3 3 170 149 45
4 4 368 213 317
5 5 49 90 266
> dfl
# A tibble: 20 x 2
name value
<chr> <dbl>
1 answer_option 1
2 q1 5
3 q2 24
4 q3 NA
5 answer_option 2
6 q1 50
7 q2 111
8 q3 18
9 answer_option 3
10 q1 170
11 q2 149
12 q3 45
13 answer_option 4
14 q1 368
15 q2 213
16 q3 317
17 answer_option 5
18 q1 49
19 q2 90
20 q3 266
Upvotes: 1
Views: 86
Reputation: 21
Okay, its actually kind of easy:
colSums(dfw[4:5,2:4], na.rm = T)/colSums(dfw[,2:4], na.rm = T)
Upvotes: 1