Reputation: 1632
I have a set of codes I want to check in my dataframe, and if they exist I want to create a column to indicate TRUE
or FALSE
.
Therefore, some of the codes I have in my datafame: OO14562, MM156789076, AB1234674, HIB00000, POL112310
The dataframe is here:
df<-structure(list(Codes = structure(c(5L, 4L, 1L, 3L, 7L, 8L, 2L,
6L), .Label = c("AB1234674", "AB13", "HIB00000", "MM156789076",
"OO14562", "POL1123", "POL112310", "TY543"), class = "factor")), .Names = "Codes", row.names = c(NA,
-8L), class = "data.frame")
According to the dataframe, the first 5 should return a TRUE, and the next three should be FALSE.
My code is here
gsub([OO|MM|AB|HIB|POL[0-9]{5-9})
But that is not taking me anywhere.
Upvotes: 0
Views: 36
Reputation: 7312
One, we need to use parenthesis not brackets to separate the letter sets. Brackets say "match one of" which is going to be unpredictable when paired with pipes. [aa|bb|cc]
will actually match a
, b
, c
, or a literal |
, which is not the behavior you want.
Two, we'll use grepl
because it returns a logical vector, no need to use gsub
.
Three, quantity to match is specified in curly braces { }
, but min and max are separated by a comma, not a dash.
You could also use [0-9]
instead of \\d
(any digit), but I like \\d
for brevity.
And for completeness, I added ^
and $
to match the beginning and end of the string after the pattern.
This gives us:
df$check <- grepl("^(OO|MM|AB|HIB|POL)\\d{5,9}$", df$Codes)
Codes check
1 OO14562 TRUE
2 MM156789076 TRUE
3 AB1234674 TRUE
4 HIB00000 TRUE
5 POL112310 TRUE
6 TY543 FALSE
7 AB13 FALSE
8 POL1123 FALSE
Upvotes: 3