Reputation: 223

Adding maximum values from different levels to a new column in a data.frame

I have a following R problem. I made an experiment and observed some cars speed. I have a table with cars (where number 1 means for example Porche, 2 Volvo and so on) and their speeds. One car could been taken into an observation more than once. So, for example, Porche was observed tree times, Volvo two times.

exp<-data.frame(car=c(1,1,1,2,2,3),speed=c(10,20,30,40,50,60))

I would like to add a third column, where for every row/every car the maximum speed is calculated. So it looks like that:

exp<-data.frame(car=c(1,1,1,2,2,3),speed=c(10,20,30,40,50,60), maxSpeed=c(30,30,30,50,50,60))

Maximal observed speed for Porsche was 30, so every row with Porsche will get maxSpeed = 30.

I know that it should be apply/sapply function, but have no idea how to implement it. Anyone? :)

Upvotes: 2

Answers (3)

Arun

Reputation: 118819

transform(exp, maxSpeed = ave(speed, car, FUN=max))

Another way using split:

exp$maxSpeed <- exp$speed
split(exp$maxSpeed, exp$car) <- lapply(split(exp$maxSpeed, exp$car), max)
exp

Upvotes: 1

Ricardo Saporta

Reputation: 55380

very straight forward with data.table

library(data.table)

exp <- data.table(exp)
exp[, maxSpeed := max(speed), by=car]

which gives:

exp
   car speed maxSpeed
1:   1    10       30
2:   1    20       30
3:   1    30       30
4:   2    40       50
5:   2    50       50
6:   3    60       60

Upvotes: 2

Michele

Reputation: 8753

@Arun this is my result in a bigger sample (1000 records). The ratio of the medians is now (actually) 0.82:

exp <- data.frame(car=sample(1:10, 1000, T),speed=rnorm(1000, 20, 5))

f1 <- function() mutate(exp, maxSpeed = ave(speed, car, FUN=max))
f2 <- function() transform(exp, maxSpeed = ave(speed, car, FUN=max))

library(microbenchmark)
library(plyr)
> microbenchmark(f1(), f2(), times=1000)
Unit: microseconds
 expr     min      lq  median       uq      max neval
 f1() 551.321 565.112 570.565 589.9680 27866.23  1000
 f2() 662.933 683.138 689.552 713.7665 28510.24  1000

the plyr documentation itself says Mutate seems to be considerably faster than transform for large data frames.

However, for this case, you're probably right. If I enlarge the sample:

> exp <- data.frame(car=sample(1:1000, 100000, T),speed=rnorm(100000, 20, 5))
> microbenchmark(f1(), f2(), times=100)
Unit: milliseconds
 expr      min       lq   median       uq      max neval
 f1() 37.92438 39.00056 40.66607 41.18115 77.41645   100
 f2() 39.47731 40.28650 43.11927 43.70779 78.34878   100

The ratio gets close to one. To be honest I was quite sure about plyr perfomance (always rely on it in my codes), that's why my 'claim' in the comment. Probably in different situation it performs better..

EDIT:

using f3() from @Arun comment

> microbenchmark(f1(), f2(), f3(), times=100)
Unit: milliseconds
 expr      min       lq   median       uq      max neval
 f1() 38.76050 39.57129 41.48728 42.14812 76.94338   100
 f2() 40.38913 41.19767 44.12329 44.78782 79.94021   100
 f3() 38.63606 39.58700 40.24272 42.04902 76.07551   100

Yep! slightly faster... moves less data?

Upvotes: 2

Adding maximum values from different levels to a new column in a data.frame

Answers (3)

Related Questions