Reputation: 2619
I want to conditionally subset a dataframe without referencing the dataframe. For example if I have the following:
long_data_frame_name <- data.frame(x=1:10, y=1:10)
I want to say:
subset <- long_data_frame_name[x < 5,]
But instead, I have to say:
subset <- long_data_frame_name[long_data_frame_name$x < 5,]
plyr and ggplot handle this so beautifully. Is there any package that makes subsetting a data frame similarly beautiful?
Upvotes: 7
Views: 736
Reputation: 593
Try dplyr, released after this question was posted and answered. It is great for many common data frame munging tasks.
library(dplyr)
subset <- filter(long_data_frame_name, x > 5)
or, equivalently:
subset <- long_data_frame_name %>% filter(x > 5)
Upvotes: 3
Reputation: 16776
Yes:
newdata <- subset(mydata, sex=="m" & age > 25)
or
newdata <- subset(mydata, sex=="m" & age > 25 , select=weight:income)
Reference: http://www.statmethods.net/management/subset.html
Upvotes: 5
Reputation: 193517
Beauty is subjective, isn't it? In the interest of sharing other solutions, there's also the sqldf
package:
library(sqldf)
subset <- sqldf("select * from long_data_frame_name where x < 5")
Upvotes: 4
Reputation: 162321
It sounds like you are looking for the data.table package, which implements indexing syntax just like that which you describe. (data.table
objects are essentially data.frame
s with added functionality, so you can continue to use them almost anywhere you would use a "plain old" data.frame.)
Matthew Dowle, the package's author, argues for the advantages of [.data.table()
's indexing syntax in his answer to this popular SO [r]-tag question. His answer there could just as well have been written as a direct response to your question above!
Here's an example:
library(data.table)
long_data_table_name <- data.table(x=1:10, y=1:10)
subset <- long_data_table_name[x < 5, ]
subset
# x y
# 1: 1 1
# 2: 2 2
# 3: 3 3
# 4: 4 4
Upvotes: 10