Akash
Akash

Reputation: 359

Filter out records with a pattern in R dataframe using dplyr with regex

I am trying to extract only the records which has date in col1 and filter out other records. Problem is my variable does not have a fixed format of the data. So I am using regex on dplyr package in R with a pattern match. Able to filter the text data from the dataset. However not able to filter out the record which just have "2018". Any help would be really appreciated.

library(dplyr)
library(re)
library(stringr)
data1 <- data.frame( c( "sds_ds", "2018/01/11", "02/04/2018","2018"), c( 2018, 76, 35,45), c( 2017, 79, 38,46 ))
names(data1) <- c("col1", "col2", "col3")
data1

data1_clean = data1 %>% 
  filter(!str_detect(col1, pattern = "[a-z]"))
data1_clean

Upvotes: 2

Views: 4038

Answers (2)

akrun
akrun

Reputation: 887028

If we are filtering out rows that have only year in 'col1', an option is to negate

library(stringi)
library(dplyr)
data1 %>% 
   filter(str_detect(col1, '[0-9/]'), 
          !stri_detect(col1, regex = "^[0-9]{4}$"))

#         col1 col2 col3
#1 2018/01/11   76   79
#2 02/04/2018   35   38

Upvotes: 3

Ronak Shah
Ronak Shah

Reputation: 388862

We can define the regex based on the date format which we have and use it in filter

library(dplyr)

data1 %>% filter(grepl("[0-9]{2,4}\\/[0-9]{2}\\/[0-9]{2,4}", col1))


#        col1 col2 col3
#1 2018/01/11   76   79
#2 02/04/2018   35   38

equivalent in base R

data1[grepl("[0-9]{2,4}\\/[0-9]{2}\\/[0-9]{2,4}", data1$col1), ]

Upvotes: 1

Related Questions