Reputation: 3075
I have a data frame and I am trying to collapse a bunch of columns by a group column called "id", but I get a lot of NA values. Is there a way to remove NA, or use only unique values in the paste function?
df<- data.frame(id=c(1, 1, 2, 3, 3, 3),
bar=c('a', 'b', 'a', 'b', 'c', 'NA'),
foo=c('NA', 'b', 'a', 'b', 'NA', 'NA'))
library(plyr)
df %>%
group_by(id) %>%
summarise_all(funs(unique(paste(., collapse = ","))))
I would like to remove the "NA"... Also, this functions takes longer than the basic aggregate, however I have not found the correct formula for the aggregate function either:
aggregate(df, by=list(df$id), paste, collapse=",")
Thanks for your help!
Upvotes: 2
Views: 1684
Reputation: 18681
It's safer to first convert literal 'NA' values to true NA's:
library(dplyr)
df %>%
mutate_all(~replace(., .=='NA', NA)) %>%
group_by(id) %>%
summarize_all(~paste(unique(na.omit(.)), collapse = ','))
Output:
# A tibble: 3 x 3
id bar foo
<dbl> <chr> <chr>
1 1 a,b b
2 2 a a
3 3 b,c b
Upvotes: 2
Reputation: 33488
You can add an ifelse():
df %>%
group_by(id) %>%
summarise_all(funs(unique(paste(ifelse(is.na(.), "", .), collapse = ","))))
Upvotes: 1