biostatguy12
biostatguy12

Reputation: 659

is there a way in R to return values with wild cards for certain characters?

say I have something like:

df<-data.frame(ID=c(1, 1, 1,2,2,2,2),
               value=c('ABC000', 'ABC002', 'ABC003', 'ACC000', 'ABC005', 'ABC006', 'ABC007'),
               keep=c(1, 0, 1,0,0,1,0))

  ID  value keep
1  1 ABC000    1
2  1 ABC002    0
3  1 ABC003    1
4  2 ACC000    0
5  2 ABC005    0
6  2 ABC006    1
7  2 ABC007    0

and say I want to keep values where the prefix is 'ABC', I don't care what the values are in the fourth and fifth character, and then the last place has to be 0, 3 or 6. Is there a way in R to do this?

Upvotes: 0

Views: 298

Answers (2)

markus
markus

Reputation: 26353

Try

value[startsWith(value, "ABC") & grepl("[036]$", value)]
# [1] "ABC000" "ABC003" "ABC006"

To create a new column with 1s and 0s for the condition being TRUE or not you can do

+(startsWith(value, "ABC") & grepl("[036]$", value))
# [1] 1 0 1 0 0 1 0

data

value=c('ABC000', 'ABC002', 'ABC003', 'ACC000', 'ABC005', 'ABC006', 'ABC007')

Upvotes: 2

OTStats
OTStats

Reputation: 1868

You can use stringr functions and and regex like this:

library(dplyr)
library(stringr)

df %>% 
  filter(str_detect(value, pattern = "^ABC.{2}(0|3|6)$"))

#   ID  value keep
# 1  1 ABC000    1
# 2  1 ABC003    1
# 3  2 ABC006    1

Deconstructing the detection pattern as follows:

  • The string should start with ^ABC, where ^ anchors the front of the string,
  • .{2} where . is your "wildcard" character and I specify that there are two ({2}),
  • and (0|3|6)$ finally I say that the string ends with either 0, 3, or 6 (where $ anchors the end of the string)

Edit

User mentioned in comments interest in creating a new field that flags whether value field matches the specified condition.

You can add a new field using mutate and if_else as follows:

df %>% 
  mutate(flag = if_else(str_detect(value, pattern = "^ABC.{2}[036]$"), 1, 0))

#   ID  value keep flag
# 1  1 ABC000    1    1
# 2  1 ABC002    0    0
# 3  1 ABC003    1    1
# 4  2 ACC000    0    0
# 5  2 ABC005    0    0
# 6  2 ABC006    1    1
# 7  2 ABC007    0    0

The if_else statement assigns the value 1 for matching the pattern and 0 when it doesn't.

Upvotes: 1

Related Questions