ZArmstrong
ZArmstrong

Reputation: 77

Aggregating across a multiple choice question data structure

I have a dataset that has a multiple choice question that's been broken out so each option is its own column. I also have weight column and I'm looking to aggregate across each column of the multiple choice to find the sum of the weight column that selected that option. I know I could do this in a for loop to go across each column, but I was looking for a cleaner method.

Any suggestions?

Dput:

structure(list(`Medical (treatment for any diagnosed medical condition)` = c("0", 
NA, "0", "Medical (treatment for any diagnosed medical condition)", 
"Medical (treatment for any diagnosed medical condition)", "Medical (treatment for any diagnosed medical condition)", 
"Medical (treatment for any diagnosed medical condition)", "0", 
"0", "0"), `Dental (preventive or routine care for oral health)` = c("0", 
NA, "0", "Dental (preventive or routine care for oral health)", 
"0", "Dental (preventive or routine care for oral health)", "Dental (preventive or routine care for oral health)", 
"0", "0", "0"), `Vision (preventive or routine care for eye health)` = c("0", 
NA, "Vision (preventive or routine care for eye health)", "Vision (preventive or routine care for eye health)", 
"0", "Vision (preventive or routine care for eye health)", "0", 
"0", "0", "0"), `Wellness incentives program (such as smoking cessation or weight loss programs)` = c("0", 
NA, "0", "0", "0", "0", "0", "0", "Wellness incentives program (such as smoking cessation or weight loss programs)", 
"Wellness incentives program (such as smoking cessation or weight loss programs)"
), `Gym membership` = c("0", NA, "0", "0", "0", "0", "0", "Gym membership", 
"Gym membership", "Gym membership"), `Life insurance` = c("Life insurance", 
NA, "Life insurance", "Life insurance", "0", "Life insurance", 
"0", "0", "0", "Life insurance"), `Accidental death and dismemberment` = c("0", 
NA, "0", "Accidental death and dismemberment", "0", "Accidental death and dismemberment", 
"0", "0", "0", "0"), `Health savings account (HSA)` = c("0", 
NA, "0", "0", "Health savings account (HSA)", "Health savings account (HSA)", 
"Health savings account (HSA)", "0", "0", "Health savings account (HSA)"
), `Flexible spending account (FSA)` = c("0", NA, "0", "0", "0", 
"0", "0", "0", "0", "0"), `Health reimbursement arrangements (HRA)` = c("Health reimbursement arrangements (HRA)", 
NA, "0", "0", "0", "0", "Health reimbursement arrangements (HRA)", 
"0", "0", "0"), `State medical savings program (MSP)` = c("0", 
NA, "0", "0", "0", "0", "0", "State medical savings program (MSP)", 
"0", "0"), `Pharmacy card` = c("0", NA, "0", "0", "0", "0", "0", 
"0", "0", "0"), `Over-the-counter/supplemental benefits` = c("0", 
NA, "0", "0", "Over-the-counter/supplemental benefits", "Over-the-counter/supplemental benefits", 
"0", "0", "0", "0"), `Other, please specify:` = c("0", NA, "0", 
"0", "0", "0", "0", "0", "0", "0"), weight = c(102099.073110486, 
99927.5001779742, 100569.577453382, 98542.6996469168, 38237.391427588, 
82498.4815047276, 99924.9006748349, 82498.4815047276, 79954.3106465319, 
91795.9294203397)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

Upvotes: 2

Views: 84

Answers (2)

TimTeaFan
TimTeaFan

Reputation: 18551

We could pivot_longer() and then count(name, value, wt = weight):

library(dplyr)
library(tidyr)

df1 %>% 
  pivot_longer(cols = - weight) %>% 
  count(name, value, wt = weight)

#> # A tibble: 39 × 3
#>    name                                                value                   n
#>    <chr>                                               <chr>               <dbl>
#>  1 Accidental death and dismemberment                  0                  5.95e5
#>  2 Accidental death and dismemberment                  Accidental death … 1.81e5
#>  3 Accidental death and dismemberment                  <NA>               9.99e4
#>  4 Dental (preventive or routine care for oral health) 0                  4.95e5
#>  5 Dental (preventive or routine care for oral health) Dental (preventiv… 2.81e5
#>  6 Dental (preventive or routine care for oral health) <NA>               9.99e4
#>  7 Flexible spending account (FSA)                     0                  7.76e5
#>  8 Flexible spending account (FSA)                     <NA>               9.99e4
#>  9 Gym membership                                      0                  5.22e5
#> 10 Gym membership                                      Gym membership     2.54e5
#> # … with 29 more rows

Created on 2022-09-17 by the reprex package (v0.3.0)

Upvotes: 0

akrun
akrun

Reputation: 887048

Perhaps this helps

library(dplyr)
library(purrr)
map(setdiff(names(df1), "weight"),
   ~  df1 %>% 
       select(.x, weight) %>%
       group_by(across(all_of(.x))) %>%
       summarise(weight = sum(weight, na.rm = TRUE)))

Upvotes: 1

Related Questions