Chris Ruehlemann
Chris Ruehlemann

Reputation: 21440

How to subset dataframe on columns with dates

I have this illustrative dataframe:

df <- data.frame(Customer_Name = c("A", "B"), 
                 Country = c("X", "Y"), 
                 Transaction = c("01-01-2009", "01-31-2012"), 
                 Delivery = c("01-02-2009", "02-01-2012"), 
                 Return = c("01-27-2009", "03-22-2012"), 
                 Cost = c(1234, 5678))

I'd like to subset df on those columns that contain dates. I'm doing fine with identifying the relevant columns with grepl:

apply(df, 2, function(x) grepl("\\d{2}-\\d{2}-\\d{4}", x))
     Customer_Name Country Transaction Delivery Return  Cost
[1,]         FALSE   FALSE        TRUE     TRUE   TRUE FALSE
[2,]         FALSE   FALSE        TRUE     TRUE   TRUE FALSE

But as soon as I try to subset df accordingly, I get an error:

df[, apply(df, 2, function(x) grepl("\\d{2}-\\d{2}-\\d{4}", x))]
Error in `[.data.frame`(df, , apply(df, 2, function(x) grepl("\\d{2}-\\d{2}-\\d{4}",  : 
  undefined columns selected

How can I subset df on the date columns properly?

Upvotes: 0

Views: 34

Answers (1)

IceCreamToucan
IceCreamToucan

Reputation: 28705

df[sapply(df, function(x) all(grepl("^\\d{2}-\\d{2}-\\d{4}$", x)))]
#   Transaction   Delivery     Return
# 1  01-01-2009 01-02-2009 01-27-2009
# 2  01-31-2012 02-01-2012 03-22-2012

Or with dplyr

library(dplyr)

df %>% 
  select_if(~ all(grepl("^\\d{2}-\\d{2}-\\d{4}$", .)))
#   Transaction   Delivery     Return
# 1  01-01-2009 01-02-2009 01-27-2009
# 2  01-31-2012 02-01-2012 03-22-2012

Upvotes: 2

Related Questions