Reputation: 2619

Is there a better syntax for subsetting a data frame in R?

I want to conditionally subset a dataframe without referencing the dataframe. For example if I have the following:

long_data_frame_name <- data.frame(x=1:10, y=1:10)

I want to say:

subset <- long_data_frame_name[x < 5,]

But instead, I have to say:

subset <- long_data_frame_name[long_data_frame_name$x < 5,]

plyr and ggplot handle this so beautifully. Is there any package that makes subsetting a data frame similarly beautiful?

Upvotes: 7

Answers (4)

NC maize breeding Jim

Reputation: 593

Try dplyr, released after this question was posted and answered. It is great for many common data frame munging tasks.

library(dplyr)
subset <- filter(long_data_frame_name, x > 5)

or, equivalently:

subset <- long_data_frame_name %>% filter(x > 5)

Upvotes: 3

Davoud Taghawi-Nejad

Reputation: 16776

Yes:

newdata <- subset(mydata, sex=="m" & age > 25)

newdata <- subset(mydata, sex=="m" & age > 25 , select=weight:income)

Reference: http://www.statmethods.net/management/subset.html

Upvotes: 5

A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

Beauty is subjective, isn't it? In the interest of sharing other solutions, there's also the sqldf package:

library(sqldf)
subset <- sqldf("select * from long_data_frame_name where x < 5")

Upvotes: 4

Josh O'Brien

Reputation: 162321

It sounds like you are looking for the data.table package, which implements indexing syntax just like that which you describe. (data.table objects are essentially data.frames with added functionality, so you can continue to use them almost anywhere you would use a "plain old" data.frame.)

Matthew Dowle, the package's author, argues for the advantages of [.data.table()'s indexing syntax in his answer to this popular SO [r]-tag question. His answer there could just as well have been written as a direct response to your question above!

Here's an example:

library(data.table)
long_data_table_name <- data.table(x=1:10, y=1:10) 

subset <- long_data_table_name[x < 5, ]
subset
#    x y
# 1: 1 1
# 2: 2 2
# 3: 3 3
# 4: 4 4

Upvotes: 10

Is there a better syntax for subsetting a data frame in R?

Answers (4)

Related Questions