Reputation: 51
I have this data sets
d1 = data.frame(PatID=c(1,1,1,2,2,4,1,2),
code1=c(1,2,3,1,2,7,2,8),
location=c('a','b','c','d','e','f','g','h'))
I want to eliminate duplicate rows (on PatID
) so
PatID
; Output should be:
PatID code1 location
1 1,2,3 a
2 1,2,8 d
4 7 f
I have tried unsuccessfully aggregate, ddply and even struggled with melt dcast. I am a former unix programmer but new to r.
Upvotes: 3
Views: 2530
Reputation: 193667
Just to make sure base R isn't entirely ignored (or to make you appreciate the syntax of "plyr" and "data.table" for these types of problems)...
Two options:
ave
to do the "aggregation" and unique
to slim down the outputunique(within(d1, {
code1 <- ave(code1, PatID,
FUN=function(x) paste(unique(x), collapse = ","))
location <- ave(location, PatID, FUN=function(x) x[1])
}))
# PatID code1 location
# 1 1 1,2,3 a
# 4 2 1,2,8 d
# 6 4 7 f
aggregate
and merge
working togethermerge(
aggregate(code1 ~ PatID, d1,
function(x) paste(unique(x), collapse = ",")),
aggregate(location ~ PatID, d1, function(x) x[1]))
# PatID code1 location
# 1 1 1,2,3 a
# 2 2 1,2,8 d
# 3 4 7 f
The closest purely aggregate
solution I can think of is as follows:
aggregate(cbind(code1, as.character(location)) ~ PatID, d1,
function(x) cbind(paste(unique(x), collapse = ","),
as.character(x[1])))
# PatID code1.1 code1.2 V2.1 V2.2
# 1 1 1,2,3 1 a,b,c,g a
# 2 2 1,2,8 1 d,e,h d
# 3 4 7 7 f f
It gives you all the information you were interested in, and a decent amount of information you weren't interested in too...
Upvotes: 1
Reputation: 173677
ddply
works just fine:
ddply(d1,.(PatID),
summarise,
code1 = paste(unique(code1),collapse = ','),
location = location[1])
PatID code1 location
1 1 1,2,3 a
2 2 1,2,8 d
3 4 7 f
Oh fine. Here's the data.table version:
d2 <- as.data.table(d1)
> d2[,list(code1 = paste(unique(code1),collapse = ','),location = location[1]),by = 'PatID']
PatID code1 location
1: 1 1,2,3 a
2: 2 1,2,8 d
3: 4 7 f
Upvotes: 4