Reputation: 465
I need to align formatting of some clinical trial IDs two merge two databases. For example, in database A patient 123 visit 1 is stored as '123v01' and in database B just '123v1'
I can match A to B by grep match those containing 'v0' and strip out the trailing zero to just 'v', but for academic interest & expanding R
/ regex
skills, I want to reverse match B to A by matching only those containing 'v' followed by only 1 digit, so I can then separately pad that digit with a leading zero.
For a reprex:
string <- c("123v1", "123v01", "123v001")
I can match those with >= 2 digits following a 'v', then inverse subset
> idx <- grepl("v(\\d{2})", string)
> string[!idx]
[1] "123v1"
But there must be a way to match 'v' followed by just a single digit only? I have tried the lookarounds
# Negative look ahead "v not followed by 2+ digits"
grepl("v(?!\\d{2})", string)
# Positive look behind "single digit following v"
grepl("(?<=v)\\d{1})", string)
But both return an 'invalid regex' error
Any suggestions?
Upvotes: 3
Views: 736
Reputation: 626748
You may use
grepl("v\\d(?!\\d)", string, perl=TRUE)
The v\d(?!\d)
pattern matches v
, 1 digits and then makes sure there is no digit immediately to the right of the current location (i.e. after the v
+ 1 digit).
See the regex demo.
Note that you need to enable PCRE regex flavor with the perl=TRUE
argument.
Upvotes: 1
Reputation: 2019
You need to set the perl=TRUE
flag on your grepl
function.
e.g.
grepl("v(?!\\d{2})", string, perl=TRUE)
[1] TRUE FALSE FALSE
See this question for more info.
Upvotes: 3