Reputation: 21440
I have this illustrative dataframe:
df <- data.frame(Customer_Name = c("A", "B"),
Country = c("X", "Y"),
Transaction = c("01-01-2009", "01-31-2012"),
Delivery = c("01-02-2009", "02-01-2012"),
Return = c("01-27-2009", "03-22-2012"),
Cost = c(1234, 5678))
I'd like to subset df
on those columns that contain dates. I'm doing fine with identifying the relevant columns with grepl
:
apply(df, 2, function(x) grepl("\\d{2}-\\d{2}-\\d{4}", x))
Customer_Name Country Transaction Delivery Return Cost
[1,] FALSE FALSE TRUE TRUE TRUE FALSE
[2,] FALSE FALSE TRUE TRUE TRUE FALSE
But as soon as I try to subset df
accordingly, I get an error:
df[, apply(df, 2, function(x) grepl("\\d{2}-\\d{2}-\\d{4}", x))]
Error in `[.data.frame`(df, , apply(df, 2, function(x) grepl("\\d{2}-\\d{2}-\\d{4}", :
undefined columns selected
How can I subset df
on the date columns properly?
Upvotes: 0
Views: 34
Reputation: 28705
df[sapply(df, function(x) all(grepl("^\\d{2}-\\d{2}-\\d{4}$", x)))]
# Transaction Delivery Return
# 1 01-01-2009 01-02-2009 01-27-2009
# 2 01-31-2012 02-01-2012 03-22-2012
Or with dplyr
library(dplyr)
df %>%
select_if(~ all(grepl("^\\d{2}-\\d{2}-\\d{4}$", .)))
# Transaction Delivery Return
# 1 01-01-2009 01-02-2009 01-27-2009
# 2 01-31-2012 02-01-2012 03-22-2012
Upvotes: 2