Summarise all columns based on conditions in 2 columns

Question

This is my input data:

structure(list(exp_sal = c(1, 1, NA, NA), curr_sal = c(1, NA, 
1, NA), `1` = c(59L, 33L, 237L, 244L), `2` = c(98L, 199L, 127L, 
178L), `3` = c(75L, 283L, 53L, 141L), `4` = c(26L, 151L, 23L, 
111L), `5` = c(8L, 77L, 20L, 29L), `6` = c(4L, 57L, 5L, 25L), 
    `7` = c(1L, 30L, 1L, NA), `8` = c(32L, 21L, 47L, NA)), row.names = c(NA, 
-4L), class = "data.frame")

I want each column to have summary counts based on conditions: If exp_sal is not NA, then add up each column If curr_sal is not NA, then add up each column

The result:

I would like to have rows 1 and 3 summarised for exp_sal and rows 1 and 3 summarised for curr_sal and row 4 dropped entirely.

The result I want:

result <- structure(list(exp_sal = c(1, NA), curr_sal = c(NA, 1), 
                     `1` = c(97L, 296L), `2` = c(297L, 225L), 
                     `3` = c(358L, 128L), `4` = c(177L, 49L), 
                     `5` = c(85L, 28L), `6` = c(61L, 9L), 
                     `7` = c(31L, 2L), `8` = c(53L, 79L)), 
                     row.names = c(NA, -2L), class = "data.frame")

I have looked at this answer

Sum Values of Every Column in Data Frame with Conditional For Loop

but I don't know if I should use mutate and summarise_at

or summarise_if or case_when

Sorry about posting such a basic question - any help or advice would be greatly appreciated.

tjebo · Accepted Answer

Your data is messy. I recommend reshaping it for easier aggregation. One way is like this: (comments in the code)

mydf <- structure(list(  exp_sal = c(1, 1, NA, NA), curr_sal = c(    1, NA,    1, NA  ), `1` = c(59L, 33L, 237L, 244L), `2` = c(    98L, 199L, 127L,    178L  ), `3` = c(75L, 283L, 53L, 141L), `4` = c(    26L, 151L, 23L,    111L  ), `5` = c(8L, 77L, 20L, 29L), `6` = c(4L, 57L, 5L, 25L),  `7` = c(1L, 30L, 1L, NA), `8` = c(32L, 21L, 47L, NA)), row.names = c(  NA,  -4L), class = "data.frame")

library(tidyverse) #also to load tidyr

mydf %>% gather(key, value, -exp_sal,-curr_sal) %>% # crucial step to make data long
  mutate(curr_val = ifelse(curr_sal == 1,value,NA),
         exp_val = ifelse(exp_sal == 1,value,NA)) %>% #this step actually cleans up the data and assigns a value to each new column for 'exp' and 'curr'
  group_by(key) %>% #for your summary, because you want to sum up your previous rows which are now assigned a key in a new column
  summarise_at( .vars = vars(curr_val, exp_val), .funs = sum, na.rm = TRUE)
#> # A tibble: 8 x 3
#>   key   curr_val exp_val
#>          
#> 1 1          296      92
#> 2 2          225     297
#> 3 3          128     358
#> 4 4           49     177
#> 5 5           28      85
#> 6 6            9      61
#> 7 7            2      31
#> 8 8           79      53

^{Created on 2019-11-17 by the reprex package (v0.2.1)}

You can look at each intermediate step by removing the pipes. If you really need the data in the shape of your presented result, try t() But to be honest, I don't think this would be helpful for further analysis..

Summarise all columns based on conditions in 2 columns

Answers (1)

Related Questions