Reputation: 111
This is my input data:
structure(list(exp_sal = c(1, 1, NA, NA), curr_sal = c(1, NA,
1, NA), `1` = c(59L, 33L, 237L, 244L), `2` = c(98L, 199L, 127L,
178L), `3` = c(75L, 283L, 53L, 141L), `4` = c(26L, 151L, 23L,
111L), `5` = c(8L, 77L, 20L, 29L), `6` = c(4L, 57L, 5L, 25L),
`7` = c(1L, 30L, 1L, NA), `8` = c(32L, 21L, 47L, NA)), row.names = c(NA,
-4L), class = "data.frame")
I want each column to have summary counts based on conditions: If exp_sal is not NA, then add up each column If curr_sal is not NA, then add up each column
The result:
I would like to have rows 1 and 3 summarised for exp_sal and rows 1 and 3 summarised for curr_sal and row 4 dropped entirely.
The result I want:
result <- structure(list(exp_sal = c(1, NA), curr_sal = c(NA, 1),
`1` = c(97L, 296L), `2` = c(297L, 225L),
`3` = c(358L, 128L), `4` = c(177L, 49L),
`5` = c(85L, 28L), `6` = c(61L, 9L),
`7` = c(31L, 2L), `8` = c(53L, 79L)),
row.names = c(NA, -2L), class = "data.frame")
I have looked at this answer
Sum Values of Every Column in Data Frame with Conditional For Loop
but I don't know if I should use mutate and summarise_at
or summarise_if or case_when
Sorry about posting such a basic question - any help or advice would be greatly appreciated.
Upvotes: 0
Views: 182
Reputation: 23807
Your data is messy. I recommend reshaping it for easier aggregation. One way is like this: (comments in the code)
mydf <- structure(list( exp_sal = c(1, 1, NA, NA), curr_sal = c( 1, NA, 1, NA ), `1` = c(59L, 33L, 237L, 244L), `2` = c( 98L, 199L, 127L, 178L ), `3` = c(75L, 283L, 53L, 141L), `4` = c( 26L, 151L, 23L, 111L ), `5` = c(8L, 77L, 20L, 29L), `6` = c(4L, 57L, 5L, 25L), `7` = c(1L, 30L, 1L, NA), `8` = c(32L, 21L, 47L, NA)), row.names = c( NA, -4L), class = "data.frame")
library(tidyverse) #also to load tidyr
mydf %>% gather(key, value, -exp_sal,-curr_sal) %>% # crucial step to make data long
mutate(curr_val = ifelse(curr_sal == 1,value,NA),
exp_val = ifelse(exp_sal == 1,value,NA)) %>% #this step actually cleans up the data and assigns a value to each new column for 'exp' and 'curr'
group_by(key) %>% #for your summary, because you want to sum up your previous rows which are now assigned a key in a new column
summarise_at( .vars = vars(curr_val, exp_val), .funs = sum, na.rm = TRUE)
#> # A tibble: 8 x 3
#> key curr_val exp_val
#> <chr> <int> <int>
#> 1 1 296 92
#> 2 2 225 297
#> 3 3 128 358
#> 4 4 49 177
#> 5 5 28 85
#> 6 6 9 61
#> 7 7 2 31
#> 8 8 79 53
Created on 2019-11-17 by the reprex package (v0.2.1)
You can look at each intermediate step by removing the pipes.
If you really need the data in the shape of your presented result, try t()
But to be honest, I don't think this would be helpful for further analysis..
Upvotes: 1