jMathew
jMathew

Reputation: 1057

Merging specific rows in R

I would like to merge the column values for only certain rows of my df. For example in the following df,

  name time value
1   n1    1    10
2   n2    2    12
3    a    3     6
4    b    3    NA
5   n3    4     4

I would like to combine rows 3 & 4 so that the final df will be,

  name time value
1   n1    1    10
2   n2    2    12
3    a    3     6
5   n3    4     4

My Method

After trying out different approaches I settled on,

df1 <- ddply(df,
             .(time), #Split by time as events "a","b" will always same time
             function(y){
               if(all(y$name %in% c("a","b"))){ #Dont combine rows without "a"|"b"
                 y<-data.frame(t(apply(y, 2, min, na.rm=T))) #adply doesn't seem to work?
                 print(y) #Added here for debugging
                 }  
               y
               }
             )

The print statement produces the correct answer,

  name time value
1    a    3     6

but the output df1 is

  name time value
1   n1    1    10
2   n2    2    12
3    a    1     1
4   n3    4     4

I have no idea how the 1's came??

Upvotes: 0

Views: 3406

Answers (2)

Rich Scriven
Rich Scriven

Reputation: 99331

Why couldn't you use duplicated to remove the repeated time values (rows)?

> dat
#   name time value
# 1   n1    1    10
# 2   n2    2    12
# 3    a    3     6
# 4    b    3    NA
# 5   n3    4     4
> dat[!duplicated(dat$time), ]
#   name time value
# 1   n1    1    10
# 2   n2    2    12
# 3    a    3     6
# 5   n3    4     4

Upvotes: 1

shadow
shadow

Reputation: 22293

The problem is type conversions. In your apply call, the data.frame is converted to a matrix of type character. When you convert it to a data.frame, the characters are converted to factor. And then when combining the results, the factor is converted to numeric. To avoid the conversion to factors, you can use stringsAsFactors=FALSE and your code will work.

df1 <- ddply(df,
             .(time), #Split by time as events "a","b" will always same time
             function(y){
               if(all(y$name %in% c("a","b"))){ #Dont combine rows without "a"|"b"
                 y<-data.frame(t(apply(y, 2, min, na.rm=T)), stringsAsFactors=FALSE) 
               }  
               y
             }
)

Anyway, here's an alternative solution, which is a bit easier to read, less error prone and probably faster.

require(data.table)
dt <- data.table(df)
dt[name %in% c("a","b"), `:=`(name=name[1], value=min(value, na.rm=TRUE)), by=time]
unique(dt) 

Upvotes: 0

Related Questions