Reputation: 223
I have a following R problem. I made an experiment and observed some cars speed. I have a table with cars (where number 1 means for example Porche, 2 Volvo and so on) and their speeds. One car could been taken into an observation more than once. So, for example, Porche was observed tree times, Volvo two times.
exp<-data.frame(car=c(1,1,1,2,2,3),speed=c(10,20,30,40,50,60))
I would like to add a third column, where for every row/every car the maximum speed is calculated. So it looks like that:
exp<-data.frame(car=c(1,1,1,2,2,3),speed=c(10,20,30,40,50,60), maxSpeed=c(30,30,30,50,50,60))
Maximal observed speed for Porsche was 30, so every row with Porsche will get maxSpeed = 30.
I know that it should be apply/sapply function, but have no idea how to implement it. Anyone? :)
Upvotes: 2
Views: 125
Reputation: 118819
transform(exp, maxSpeed = ave(speed, car, FUN=max))
Another way using split
:
exp$maxSpeed <- exp$speed
split(exp$maxSpeed, exp$car) <- lapply(split(exp$maxSpeed, exp$car), max)
exp
Upvotes: 1
Reputation: 55380
very straight forward with data.table
library(data.table)
exp <- data.table(exp)
exp[, maxSpeed := max(speed), by=car]
which gives:
exp
car speed maxSpeed
1: 1 10 30
2: 1 20 30
3: 1 30 30
4: 2 40 50
5: 2 50 50
6: 3 60 60
Upvotes: 2
Reputation: 8753
@Arun this is my result in a bigger sample (1000 records). The ratio of the medians is now (actually) 0.82:
exp <- data.frame(car=sample(1:10, 1000, T),speed=rnorm(1000, 20, 5))
f1 <- function() mutate(exp, maxSpeed = ave(speed, car, FUN=max))
f2 <- function() transform(exp, maxSpeed = ave(speed, car, FUN=max))
library(microbenchmark)
library(plyr)
> microbenchmark(f1(), f2(), times=1000)
Unit: microseconds
expr min lq median uq max neval
f1() 551.321 565.112 570.565 589.9680 27866.23 1000
f2() 662.933 683.138 689.552 713.7665 28510.24 1000
the plyr
documentation itself says Mutate seems to be considerably faster than transform for large data frames.
However, for this case, you're probably right. If I enlarge the sample:
> exp <- data.frame(car=sample(1:1000, 100000, T),speed=rnorm(100000, 20, 5))
> microbenchmark(f1(), f2(), times=100)
Unit: milliseconds
expr min lq median uq max neval
f1() 37.92438 39.00056 40.66607 41.18115 77.41645 100
f2() 39.47731 40.28650 43.11927 43.70779 78.34878 100
The ratio gets close to one. To be honest I was quite sure about plyr
perfomance (always rely on it in my codes), that's why my 'claim' in the comment. Probably in different situation it performs better..
EDIT:
using f3()
from @Arun comment
> microbenchmark(f1(), f2(), f3(), times=100)
Unit: milliseconds
expr min lq median uq max neval
f1() 38.76050 39.57129 41.48728 42.14812 76.94338 100
f2() 40.38913 41.19767 44.12329 44.78782 79.94021 100
f3() 38.63606 39.58700 40.24272 42.04902 76.07551 100
Yep! slightly faster... moves less data?
Upvotes: 2