polarlicht
polarlicht

Reputation: 21

How to set sum of specific rows into proportion to all rows for multiple columns in R

[Edit: I found a way to do it for the data in tidy/wide format (colSums(dfw[4:5,2:4], na.rm = T)/colSums(dfw[,2:4], na.rm = T)) but does anyone know of a way to do it for data in long format?]

An alternative title (data in long format) could be "How to set sum of specific values of subcategory x into proportion to sum of all values of subcategory x, applied to all categories?".

In less abstract terms: I have multiple questions on a five point scale of which I want to calculate the top-2 proportions. The data is aggregated by answer option already (q1: 5 people chose answer option 1, 50 chose answer option 2 etc.). Therefore I have to sum up values for rows 4 and 5 and divide that sum by the sum of all five rows.

example data in wide format and the code to transform into long:

library(tidyverse)
dfw <- data.frame(answer_option = c(1,2,3,4,5), q1 = c(5,50,170,368,49), q2 = c(24,111,149,213,90), q3 = c(NA, 18, 45, 317, 266))
dfl <- dfw %>% pivot_longer(cols = 1:4)

The output should give me the sum of answer options 4 and 5 divided by the sum of all answer options for each question and could look like this:

q1     q2     q3
64.95  51.62  90.25

I could calculate everything manually (sum(dfw$q1[dfw$answer_option>3])/sum(dfw$q1)) but that is not viable for large datasets like mine.

datasets:

  answer_option  q1  q2  q3
1             1   5  24  NA
2             2  50 111  18
3             3 170 149  45
4             4 368 213 317
5             5  49  90 266
> dfl
# A tibble: 20 x 2
   name          value
   <chr>         <dbl>
 1 answer_option     1
 2 q1                5
 3 q2               24
 4 q3               NA
 5 answer_option     2
 6 q1               50
 7 q2              111
 8 q3               18
 9 answer_option     3
10 q1              170
11 q2              149
12 q3               45
13 answer_option     4
14 q1              368
15 q2              213
16 q3              317
17 answer_option     5
18 q1               49
19 q2               90
20 q3              266

Upvotes: 1

Views: 86

Answers (1)

polarlicht
polarlicht

Reputation: 21

Okay, its actually kind of easy:

colSums(dfw[4:5,2:4], na.rm = T)/colSums(dfw[,2:4], na.rm = T)

Upvotes: 1

Related Questions