Reputation: 47
I have a table that (after some initial processing) has multiple rows with the same main identifier but with different column values (either 0 or a value > 0).
Example table with main identifier "produce"
df = data.frame(produce = c("apples","apples", "bananas","bananas"),
grocery1=c(0,1,1,1),
grocery2=c(1,0,1,1),
grocery3=c(0,0,1,1))
###########################
> df
produce grocery1 grocery2 grocery3
1 apples 0 1 0
2 apples 1 0 0
3 bananas 1 1 1
4 bananas 1 1 1
I'd like to collapse (or merge?) rows with the same identifier and retain the non-empty (here, any nonzero values) values in each column
Example desired output
shopping grocery1 grocery2 grocery3
1 apples 1 1 0
2 bananas 1 1 1
Is there a simple function or piping in tidyverse that I am missing and that can handle this?
Upvotes: 3
Views: 1356
Reputation: 887901
We can use max
library(dplyr)
df %>%
group_by(produce) %>%
summarise_all(max)
# A tibble: 2 x 4
# produce grocery1 grocery2 grocery3
# <fct> <dbl> <dbl> <dbl>
#1 apples 1 1 0
#2 bananas 1 1 1
Upvotes: 1
Reputation: 389275
Using base R aggregate
we can do
aggregate(.~produce, df, function(x) +any(x > 0))
# produce grocery1 grocery2 grocery3
#1 apples 1 1 0
#2 bananas 1 1 1
Or using dplyr
library(dplyr)
df %>%
group_by(produce) %>%
summarise_all(~+any(. > 0))
# produce grocery1 grocery2 grocery3
# <fct> <int> <int> <int>
#1 apples 1 1 0
#2 bananas 1 1 1
and same with data.table
library(data.table)
setDT(df)[, lapply(.SD, function(x) +any(x > 0)), by=produce]
Upvotes: 2