Reputation: 233
How can I use aggregate command for converting this table:
name ID
a 1
a 2
a 2
a NA
b NA
c NA
c NA
to this one:
name ID
a 1|2
b NA
c NA
Thanks.
Upvotes: 1
Views: 249
Reputation: 42679
In base:
> aggregate(ID ~ name, data=x, FUN=function(y) paste(unique(y),
collapse='|'),na.action=na.pass)
name ID
1 a 1|2|NA
2 b NA
3 c NA
This differs from your specification in the handling of the fourth row.
Upvotes: 4
Reputation: 887721
We can use data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'name', if
the elements in 'ID' are all
NA, then we return the NA or else
paste
the unique
elements that are not NA in the 'ID' column.
library(data.table)
setDT(df1)[,.(ID= if(all(is.na(ID))) NA_character_ else
paste(na.omit(unique(ID)), collapse = "|")), by = name]
# name ID
#1: a 1|2
#2: b NA
#3: c NA
The same methodology can be used in dplyr
library(dplyr)
df1 %>%
group_by(name) %>%
summarise(ID = if(all(is.na(ID))) NA_character_
else paste(unique(ID[!is.na(ID)]), collapse="|"))
# name ID
# <chr> <chr>
#1 a 1|2
#2 b <NA>
#3 c <NA>
Upvotes: 3
Reputation: 215117
You can try:
library(tidyr);
df$name <- as.factor(df$name)
aggregate(ID ~ name, unique(df[complete.cases(df),]), paste, collapse = "|") %>%
complete(name)
Source: local data frame [3 x 2]
name ID
(fctr) (chr)
1 a 1|2
2 b NA
3 c NA
The logic here is filtering out all incomplete rows and duplicated rows firstly, paste the ID together and then use the complete
function from tidyr
package to automatically fill the factor variable with all the levels to make sure no information is missing.
Upvotes: 3