Reputation: 659
say I have something like:
df<-data.frame(ID=c(1, 1, 1,2,2,2,2),
value=c('ABC000', 'ABC002', 'ABC003', 'ACC000', 'ABC005', 'ABC006', 'ABC007'),
keep=c(1, 0, 1,0,0,1,0))
ID value keep
1 1 ABC000 1
2 1 ABC002 0
3 1 ABC003 1
4 2 ACC000 0
5 2 ABC005 0
6 2 ABC006 1
7 2 ABC007 0
and say I want to keep values where the prefix is 'ABC', I don't care what the values are in the fourth and fifth character, and then the last place has to be 0, 3 or 6. Is there a way in R to do this?
Upvotes: 0
Views: 298
Reputation: 26353
Try
value[startsWith(value, "ABC") & grepl("[036]$", value)]
# [1] "ABC000" "ABC003" "ABC006"
To create a new column with 1s and 0s for the condition being TRUE
or not you can do
+(startsWith(value, "ABC") & grepl("[036]$", value))
# [1] 1 0 1 0 0 1 0
data
value=c('ABC000', 'ABC002', 'ABC003', 'ACC000', 'ABC005', 'ABC006', 'ABC007')
Upvotes: 2
Reputation: 1868
You can use stringr
functions and and regex like this:
library(dplyr)
library(stringr)
df %>%
filter(str_detect(value, pattern = "^ABC.{2}(0|3|6)$"))
# ID value keep
# 1 1 ABC000 1
# 2 1 ABC003 1
# 3 2 ABC006 1
Deconstructing the detection pattern as follows:
^ABC
, where ^
anchors the front of the string, .{2}
where .
is your "wildcard" character and I specify that there are two ({2}
), (0|3|6)$
finally I say that the string ends with either 0, 3, or 6 (where $
anchors the end of the string)User mentioned in comments interest in creating a new field that flags whether value
field matches the specified condition.
You can add a new field using mutate
and if_else
as follows:
df %>%
mutate(flag = if_else(str_detect(value, pattern = "^ABC.{2}[036]$"), 1, 0))
# ID value keep flag
# 1 1 ABC000 1 1
# 2 1 ABC002 0 0
# 3 1 ABC003 1 1
# 4 2 ACC000 0 0
# 5 2 ABC005 0 0
# 6 2 ABC006 1 1
# 7 2 ABC007 0 0
The if_else
statement assigns the value 1 for matching the pattern and 0 when it doesn't.
Upvotes: 1