Reputation: 19380
I have a data table in R:
name date
---- ----
John 1156649280
Adam 1255701960
...etc...
I want to get all of the rows that have a date within a range. In SQL, I might say SELECT * FROM mytable WHERE date > 5 AND date < 15
What is the equivalent in R, to select rows based on the range of values in a particular column?
Upvotes: 42
Views: 215774
Reputation: 41601
Another option using between
from dplyr like this (data from @andrie):
df <- data.frame(name=c("John", "Adam"), date=c(3, 5))
library(dplyr)
df %>%
filter(between(date, 4, 6))
#> name date
#> 1 Adam 5
df[between(df$date, 4, 6), ]
#> name date
#> 2 Adam 5
Created on 2023-03-11 with reprex v2.0.2
Upvotes: 0
Reputation: 6778
One should also consider another intuitive way to do this using filter()
from dplyr
. Here are some examples:
set.seed(123)
df <- data.frame(name = sample(letters, 100, TRUE),
date = sample(1:500, 100, TRUE))
library(dplyr)
filter(df, date < 50) # date less than 50
filter(df, date %in% 50:100) # date between 50 and 100
filter(df, date %in% 1:50 & name == "r") # date between 1 and 50 AND name is "r"
filter(df, date %in% 1:50 | name == "r") # date between 1 and 50 OR name is "r"
# You can also use the pipe (%>%) operator
df %>% filter(date %in% 1:50 | name == "r")
Upvotes: 5
Reputation: 179578
Construct some data
df <- data.frame( name=c("John", "Adam"), date=c(3, 5) )
Extract exact matches:
subset(df, date==3)
name date
1 John 3
Extract matches in range:
subset(df, date>4 & date<6)
name date
2 Adam 5
The following syntax produces identical results:
df[df$date>4 & df$date<6, ]
name date
2 Adam 5
Upvotes: 65
Reputation: 69241
Lots of options here, but one of the easiest to follow is subset
. Consider:
> set.seed(43)
> df <- data.frame(name = sample(letters, 100, TRUE), date = sample(1:500, 100, TRUE))
>
> subset(df, date > 5 & date < 15)
name date
11 k 10
67 y 12
86 e 8
You can also insert logic directly into the index for your data.frame. The comma separates the rows from columns. We just have to remember that R indexes rows first, then columns. So here we are saying rows with date > 5 & < 15 and then all columns:
df[df$date > 5 & df$date < 15 ,]
I'd also recommend checking out the help pages for subset, ?subset
and the logical operators ?"&"
Upvotes: 17