NonieTech
NonieTech

Reputation: 45

Pattern matching 4 fixed letters followed by any 6 digits

I have a dataset with a column in which I need to find whether a particular patterns exists.

Example database:

DF <- data.frame(TextCol = c("Card number is GIFT987654","GIFT564738","no card number","543gift111111","number:gift9384730"))

What I need to do is to add another columns which says Yes or No depending on if there is the following pattern match:

first 4 characters are fixed - "GIFT", followed by exactly 6 numbers (any digits from 0 to 9)

So I need the following dataframe as the result:

DF <- data.frame(TextCol = c("Card number is GIFT987654","GIFT564738","no card number","543gift111111","number:gift9384730"), Match = c("Yes","Yes","No","Yes","Yes"))

Any ideas?

Upvotes: 1

Views: 425

Answers (2)

r2evans
r2evans

Reputation: 161155

The most basic,

DF$yesno1 <- grepl("[A-Za-z]{4}[0-9]{6}", DF$TextCol)              # if any four-letters works
DF$yesno2 <- grepl("GIFT[0-9]{6}", DF$TextCol, ignore.case = TRUE) # verbatim, case-insens
DF$yesno3 <- grepl("(GIFT|gift)[0-9]{6}", DF$TextCol)              # another way
DF
#                     TextCol yesno1 yesno2 yesno3
# 1 Card number is GIFT987654   TRUE   TRUE   TRUE
# 2                GIFT564738   TRUE   TRUE   TRUE
# 3            no card number  FALSE  FALSE  FALSE
# 4             543gift111111   TRUE   TRUE   TRUE
# 5        number:gift9384730   TRUE   TRUE   TRUE

The regular expression trick here is knowing that {m,n} will find between m and n instances of the previous pattern ([0-9] here). {m,} means at least m, {,n} means at most n, and {m} means exactly m.

In R, I tend to prefer logical types instead of strings to indicate boolean conditions, but if you really need it to be "yes" and "no", then ifelse(x, "yes", "no") on which solution you want should work.

Upvotes: 3

jay.sf
jay.sf

Reputation: 73842

Using regexpr.

transform(DF, Match=ifelse(as.numeric(regexpr("GIFT|gift\\d{6}", DF$TextCol)) > 0, "yes", "no"))
#                     TextCol Match
# 1 Card number is GIFT987654   yes
# 2                GIFT564738   yes
# 3            no card number    no
# 4             543gift111111   yes
# 5        number:gift9384730   yes

Upvotes: 2

Related Questions