Matching character followed by exactly 1 digit

Question

I need to align formatting of some clinical trial IDs two merge two databases. For example, in database A patient 123 visit 1 is stored as '123v01' and in database B just '123v1'

I can match A to B by grep match those containing 'v0' and strip out the trailing zero to just 'v', but for academic interest & expanding R / regex skills, I want to reverse match B to A by matching only those containing 'v' followed by only 1 digit, so I can then separately pad that digit with a leading zero.

For a reprex:

string <- c("123v1", "123v01", "123v001")

I can match those with >= 2 digits following a 'v', then inverse subset

> idx <- grepl("v(\d{2})", string)
> string[!idx]
[1] "123v1"

But there must be a way to match 'v' followed by just a single digit only? I have tried the lookarounds

# Negative look ahead "v not followed by 2+ digits"
grepl("v(?!\d{2})", string)

# Positive look behind "single digit following v"
grepl("(?<=v)\d{1})", string)

But both return an 'invalid regex' error

Any suggestions?

Wiktor Stribiżew · Accepted Answer

You may use

grepl("v\d(?!\d)", string, perl=TRUE)

The v\d(?!\d) pattern matches v, 1 digits and then makes sure there is no digit immediately to the right of the current location (i.e. after the v + 1 digit).

See the regex demo.

Note that you need to enable PCRE regex flavor with the perl=TRUE argument.

Matching character followed by exactly 1 digit

Answers (2)

Related Questions