Reputation: 125
I want to compare the data from one column which is the end date(end_date) with the system date(todays_date). Both columns are in the char format.
Input:
$ name: chr "Abby" "Abby" "Abby" "Abby" ...
$ std: int 2 3 4 5 6 7 8 9 10 11 ...
$ end_date: chr "25-02-2016" "25-02-2016" "25-03-2018" "25-02-2019" ...
$ todays_date: chr "07-03-2018" "07-03-2018" "07-03-2018" "07-03-2018" ...
Is there any way I can pass a sqldf
statement where I can get all the values of the input csv where end_date < todays_date? Any way other than a sqldf
statement where I can extract the values of the csv where end_date< todays_date will do.
I tried a few possible variations the below query but I can't seem to get the required output:
sel_a <- sqldf(paste("SELECT * FROM input_a WHERE end_date<",
todays_date, "", sep = ""))
sel_a
PS: I have a huge amount of data and have reduced it to fit this question.
Any help would be appreciated.
Upvotes: 1
Views: 787
Reputation: 269694
Using the raw input shown in the Note at the end first convert the dates to "Date"
class and then use any of the alternatives shown. The first two use end_date
in the input and the last two use Sys.Date()
. We show both sqldf
and base solutions.
library(sqldf)
fmt <- "%d-%m-%Y"
Input <- transform(Input_raw, end_date = as.Date(end_date, fmt),
todays_date = as.Date(todays_date, fmt))
# 1
sqldf("select * from Input where end_date <= todays_date")
# 2
subset(Input, end_date <= todays_date)
# 3
fn$sqldf("select * from Input where end_date <= `Sys.Date()`")
# 4
subset(Input, end_date <= Sys.Date())
The Input
in reproducible form:
Input_raw <- data.frame(name = "Abby", std = 2:5,
end_date = c("25-02-2016", "25-02-2016", "25-03-2018", "25-02-2019"),
todays_date = "07-03-2018", stringsAsFactors = FALSE)
Upvotes: 1
Reputation: 7620
To get a more specific answer, make a reproducible example
Convert the date column from character to date-time objects, e.g., with
library(lubridate)
your_df$end_date <- mdy(your_df$end_date)
Then, you don't even need a column for todays date, just use it as a filter condition
library(dplyr)
filter(your_df, end_date < Sys.Date())
# will return a data frame with those rows that have a date before today.
Or if you prefer:
your_df[your_df$end_date < Sys.Date(),]
# produces the same rows
Upvotes: 1