Reputation: 359
I am trying to extract only the records which has date in col1 and filter out other records. Problem is my variable does not have a fixed format of the data. So I am using regex on dplyr package in R with a pattern match. Able to filter the text data from the dataset. However not able to filter out the record which just have "2018". Any help would be really appreciated.
library(dplyr)
library(re)
library(stringr)
data1 <- data.frame( c( "sds_ds", "2018/01/11", "02/04/2018","2018"), c( 2018, 76, 35,45), c( 2017, 79, 38,46 ))
names(data1) <- c("col1", "col2", "col3")
data1
data1_clean = data1 %>%
filter(!str_detect(col1, pattern = "[a-z]"))
data1_clean
Upvotes: 2
Views: 4038
Reputation: 887028
If we are filter
ing out rows that have only year in 'col1', an option is to negate
library(stringi)
library(dplyr)
data1 %>%
filter(str_detect(col1, '[0-9/]'),
!stri_detect(col1, regex = "^[0-9]{4}$"))
# col1 col2 col3
#1 2018/01/11 76 79
#2 02/04/2018 35 38
Upvotes: 3
Reputation: 388862
We can define the regex based on the date format which we have and use it in filter
library(dplyr)
data1 %>% filter(grepl("[0-9]{2,4}\\/[0-9]{2}\\/[0-9]{2,4}", col1))
# col1 col2 col3
#1 2018/01/11 76 79
#2 02/04/2018 35 38
equivalent in base R
data1[grepl("[0-9]{2,4}\\/[0-9]{2}\\/[0-9]{2,4}", data1$col1), ]
Upvotes: 1