Reputation: 1057
I would like to merge the column values for only certain rows of my df. For example in the following df,
name time value
1 n1 1 10
2 n2 2 12
3 a 3 6
4 b 3 NA
5 n3 4 4
I would like to combine rows 3 & 4 so that the final df will be,
name time value
1 n1 1 10
2 n2 2 12
3 a 3 6
5 n3 4 4
After trying out different approaches I settled on,
df1 <- ddply(df,
.(time), #Split by time as events "a","b" will always same time
function(y){
if(all(y$name %in% c("a","b"))){ #Dont combine rows without "a"|"b"
y<-data.frame(t(apply(y, 2, min, na.rm=T))) #adply doesn't seem to work?
print(y) #Added here for debugging
}
y
}
)
The print statement produces the correct answer,
name time value
1 a 3 6
but the output df1 is
name time value
1 n1 1 10
2 n2 2 12
3 a 1 1
4 n3 4 4
I have no idea how the 1's came??
Upvotes: 0
Views: 3406
Reputation: 99331
Why couldn't you use duplicated
to remove the repeated time
values (rows)?
> dat
# name time value
# 1 n1 1 10
# 2 n2 2 12
# 3 a 3 6
# 4 b 3 NA
# 5 n3 4 4
> dat[!duplicated(dat$time), ]
# name time value
# 1 n1 1 10
# 2 n2 2 12
# 3 a 3 6
# 5 n3 4 4
Upvotes: 1
Reputation: 22293
The problem is type conversions. In your apply
call, the data.frame
is converted to a matrix
of type character
. When you convert it to a data.frame
, the character
s are converted to factor
. And then when combining the results, the factor
is converted to numeric
. To avoid the conversion to factors, you can use stringsAsFactors=FALSE
and your code will work.
df1 <- ddply(df,
.(time), #Split by time as events "a","b" will always same time
function(y){
if(all(y$name %in% c("a","b"))){ #Dont combine rows without "a"|"b"
y<-data.frame(t(apply(y, 2, min, na.rm=T)), stringsAsFactors=FALSE)
}
y
}
)
Anyway, here's an alternative solution, which is a bit easier to read, less error prone and probably faster.
require(data.table)
dt <- data.table(df)
dt[name %in% c("a","b"), `:=`(name=name[1], value=min(value, na.rm=TRUE)), by=time]
unique(dt)
Upvotes: 0