user971102
user971102

Reputation: 3075

group by a column and collapse all other columns without NA

I have a data frame and I am trying to collapse a bunch of columns by a group column called "id", but I get a lot of NA values. Is there a way to remove NA, or use only unique values in the paste function?

df<- data.frame(id=c(1, 1, 2, 3, 3, 3),
                bar=c('a', 'b', 'a', 'b', 'c', 'NA'),
                foo=c('NA', 'b', 'a', 'b', 'NA', 'NA'))

library(plyr)
df %>%
  group_by(id) %>%
  summarise_all(funs(unique(paste(., collapse = ",")))) 

I would like to remove the "NA"... Also, this functions takes longer than the basic aggregate, however I have not found the correct formula for the aggregate function either:

aggregate(df, by=list(df$id), paste, collapse=",")

Thanks for your help!

Upvotes: 2

Views: 1684

Answers (2)

acylam
acylam

Reputation: 18681

It's safer to first convert literal 'NA' values to true NA's:

library(dplyr)

df %>%
  mutate_all(~replace(., .=='NA', NA)) %>%
  group_by(id) %>%
  summarize_all(~paste(unique(na.omit(.)), collapse = ','))

Output:

# A tibble: 3 x 3
     id bar   foo  
  <dbl> <chr> <chr>
1     1 a,b   b    
2     2 a     a    
3     3 b,c   b 

Upvotes: 2

s_baldur
s_baldur

Reputation: 33488

You can add an ifelse():

df %>%
  group_by(id) %>%
  summarise_all(funs(unique(paste(ifelse(is.na(.), "", .), collapse = ",")))) 

Upvotes: 1

Related Questions