Reputation: 45
I have a dataset with a column in which I need to find whether a particular patterns exists.
Example database:
DF <- data.frame(TextCol = c("Card number is GIFT987654","GIFT564738","no card number","543gift111111","number:gift9384730"))
What I need to do is to add another columns which says Yes or No depending on if there is the following pattern match:
first 4 characters are fixed - "GIFT", followed by exactly 6 numbers (any digits from 0 to 9)
So I need the following dataframe as the result:
DF <- data.frame(TextCol = c("Card number is GIFT987654","GIFT564738","no card number","543gift111111","number:gift9384730"), Match = c("Yes","Yes","No","Yes","Yes"))
Any ideas?
Upvotes: 1
Views: 425
Reputation: 161155
The most basic,
DF$yesno1 <- grepl("[A-Za-z]{4}[0-9]{6}", DF$TextCol) # if any four-letters works
DF$yesno2 <- grepl("GIFT[0-9]{6}", DF$TextCol, ignore.case = TRUE) # verbatim, case-insens
DF$yesno3 <- grepl("(GIFT|gift)[0-9]{6}", DF$TextCol) # another way
DF
# TextCol yesno1 yesno2 yesno3
# 1 Card number is GIFT987654 TRUE TRUE TRUE
# 2 GIFT564738 TRUE TRUE TRUE
# 3 no card number FALSE FALSE FALSE
# 4 543gift111111 TRUE TRUE TRUE
# 5 number:gift9384730 TRUE TRUE TRUE
The regular expression trick here is knowing that {m,n}
will find between m
and n
instances of the previous pattern ([0-9]
here). {m,}
means at least m
, {,n}
means at most n
, and {m}
means exactly m
.
In R, I tend to prefer logical
types instead of strings to indicate boolean conditions, but if you really need it to be "yes"
and "no"
, then ifelse(x, "yes", "no")
on which solution you want should work.
Upvotes: 3
Reputation: 73842
Using regexpr
.
transform(DF, Match=ifelse(as.numeric(regexpr("GIFT|gift\\d{6}", DF$TextCol)) > 0, "yes", "no"))
# TextCol Match
# 1 Card number is GIFT987654 yes
# 2 GIFT564738 yes
# 3 no card number no
# 4 543gift111111 yes
# 5 number:gift9384730 yes
Upvotes: 2