Kasper Van Lombeek
Kasper Van Lombeek

Reputation: 633

R: What is the most efficient way to select certain rows in a dataframe

Out of the dataframe Question, with columns Question$Temperature, Question$Salary, I want to select only the Salary's with Temperature higher than 10. I always do the following:

Question[Question$Temperature>10]$Salary

Is there a cleaner way?

Upvotes: 0

Views: 135

Answers (2)

Ananta
Ananta

Reputation: 3711

three common ways with benchmarking

l<-data.frame(x=sample(1:10,1000, replace=T), y=runif(1000))
f1<-function(df){l2=df[df$x>8,"y"]}
f2<-function(df){l2=df[df$x>8,]$y}
f3<-function(df){l2=df$y[df$x>8]}
print(microbenchmark(f1(l), f2(l), f3(l), times=1000))

result

Unit: microseconds
  expr     min      lq  median      uq      max neval
 f1(l)  97.428 101.378 102.696 107.962 3757.555  1000
 f2(l) 247.081 253.226 257.614 270.780  734.659  1000
 f3(l)  59.686  62.319  63.197  64.514 3793.980  1000

Upvotes: 1

Sven Hohenstein
Sven Hohenstein

Reputation: 81693

It's more efficient to use

Question$Salary[Question$Temperature > 10]

since you do not subset a whole data frame but the values of a vector,

Upvotes: 1

Related Questions