Reputation: 133
I have an ID column that should always be formatted ABCDE123
- Five letters and three numbers, no gap no symbols.
I know for sure there are a number of rows that don't correctly follow this format. Is it possible to filter by the string format in R, so that I can identify those rows and review them?
Tidyverse is preferred, but any solution would be helpful!
Upvotes: 2
Views: 533
Reputation: 887223
If these are 5 upper case letters followed by 3 digits, specify regex to match 5 upper case letters [A-Z]{5}
from the start (^
) of the string followed by 3 digits ([0-9]{3}
) at the end ($
) of the string in str_detect
to return a logical vector which is used in filter
ing the rows of the data
library(dplyr)
library(stringr)
df1 %>%
filter(str_detect(ID, '^[A-Z]{5}[0-9]{3}$'))
If these rows should be removed, specify negate = TRUE
in str_detect
df1 %>%
filter(str_detect(ID, '^[A-Z]{5}[0-9]{3}$', negate = TRUE))
Or as @BenBolker mentioned in the comments [[:upper:]]{5}
would be more generic compared to [A-Z]{5}
Upvotes: 3