EJG_27
EJG_27

Reputation: 111

Summarise all columns based on conditions in 2 columns

This is my input data:

structure(list(exp_sal = c(1, 1, NA, NA), curr_sal = c(1, NA, 
1, NA), `1` = c(59L, 33L, 237L, 244L), `2` = c(98L, 199L, 127L, 
178L), `3` = c(75L, 283L, 53L, 141L), `4` = c(26L, 151L, 23L, 
111L), `5` = c(8L, 77L, 20L, 29L), `6` = c(4L, 57L, 5L, 25L), 
    `7` = c(1L, 30L, 1L, NA), `8` = c(32L, 21L, 47L, NA)), row.names = c(NA, 
-4L), class = "data.frame")

I want each column to have summary counts based on conditions: If exp_sal is not NA, then add up each column If curr_sal is not NA, then add up each column

The result:

I would like to have rows 1 and 3 summarised for exp_sal and rows 1 and 3 summarised for curr_sal and row 4 dropped entirely.

The result I want:

result <- structure(list(exp_sal = c(1, NA), curr_sal = c(NA, 1), 
                     `1` = c(97L, 296L), `2` = c(297L, 225L), 
                     `3` = c(358L, 128L), `4` = c(177L, 49L), 
                     `5` = c(85L, 28L), `6` = c(61L, 9L), 
                     `7` = c(31L, 2L), `8` = c(53L, 79L)), 
                     row.names = c(NA, -2L), class = "data.frame")

I have looked at this answer

Sum Values of Every Column in Data Frame with Conditional For Loop

but I don't know if I should use mutate and summarise_at

or summarise_if or case_when

Sorry about posting such a basic question - any help or advice would be greatly appreciated.

Upvotes: 0

Views: 182

Answers (1)

tjebo
tjebo

Reputation: 23807

Your data is messy. I recommend reshaping it for easier aggregation. One way is like this: (comments in the code)

mydf <- structure(list(  exp_sal = c(1, 1, NA, NA), curr_sal = c(    1, NA,    1, NA  ), `1` = c(59L, 33L, 237L, 244L), `2` = c(    98L, 199L, 127L,    178L  ), `3` = c(75L, 283L, 53L, 141L), `4` = c(    26L, 151L, 23L,    111L  ), `5` = c(8L, 77L, 20L, 29L), `6` = c(4L, 57L, 5L, 25L),  `7` = c(1L, 30L, 1L, NA), `8` = c(32L, 21L, 47L, NA)), row.names = c(  NA,  -4L), class = "data.frame")

library(tidyverse) #also to load tidyr

mydf %>% gather(key, value, -exp_sal,-curr_sal) %>% # crucial step to make data long
  mutate(curr_val = ifelse(curr_sal == 1,value,NA),
         exp_val = ifelse(exp_sal == 1,value,NA)) %>% #this step actually cleans up the data and assigns a value to each new column for 'exp' and 'curr'
  group_by(key) %>% #for your summary, because you want to sum up your previous rows which are now assigned a key in a new column
  summarise_at( .vars = vars(curr_val, exp_val), .funs = sum, na.rm = TRUE)
#> # A tibble: 8 x 3
#>   key   curr_val exp_val
#>   <chr>    <int>   <int>
#> 1 1          296      92
#> 2 2          225     297
#> 3 3          128     358
#> 4 4           49     177
#> 5 5           28      85
#> 6 6            9      61
#> 7 7            2      31
#> 8 8           79      53

Created on 2019-11-17 by the reprex package (v0.2.1)

You can look at each intermediate step by removing the pipes. If you really need the data in the shape of your presented result, try t() But to be honest, I don't think this would be helpful for further analysis..

Upvotes: 1

Related Questions