Reputation: 233

aggregate command in R

How can I use aggregate command for converting this table:

name  ID   
a     1   
a     2 
a     2
a     NA   
b     NA   
c     NA
c     NA

to this one:

name  ID   
a     1|2    
b     NA   
c     NA

Thanks.

Upvotes: 1

Answers (3)

Matthew Lundberg

Reputation: 42679

In base:

> aggregate(ID ~ name, data=x, FUN=function(y) paste(unique(y), 
                                      collapse='|'),na.action=na.pass)
  name     ID
1    a 1|2|NA
2    b     NA
3    c     NA

This differs from your specification in the handling of the fourth row.

Upvotes: 4

akrun

Reputation: 887721

We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'name', if the elements in 'ID' are all NA, then we return the NA or else paste the unique elements that are not NA in the 'ID' column.

library(data.table)
setDT(df1)[,.(ID= if(all(is.na(ID))) NA_character_ else 
           paste(na.omit(unique(ID)), collapse = "|")), by = name]
#   name  ID
#1:    a 1|2
#2:    b  NA
#3:    c  NA

The same methodology can be used in dplyr

library(dplyr)
df1 %>%
   group_by(name) %>% 
   summarise(ID = if(all(is.na(ID))) NA_character_ 
                   else paste(unique(ID[!is.na(ID)]), collapse="|"))
#   name    ID
#  <chr> <chr>
#1     a   1|2
#2     b  <NA>
#3     c  <NA>

Upvotes: 3

akuiper

Reputation: 215117

You can try:

library(tidyr);
df$name <- as.factor(df$name)

aggregate(ID ~ name, unique(df[complete.cases(df),]), paste, collapse = "|") %>% 
   complete(name)

Source: local data frame [3 x 2]

    name    ID
  (fctr) (chr)
1      a   1|2
2      b    NA
3      c    NA

The logic here is filtering out all incomplete rows and duplicated rows firstly, paste the ID together and then use the complete function from tidyr package to automatically fill the factor variable with all the levels to make sure no information is missing.

Upvotes: 3

aggregate command in R

Answers (3)

Related Questions