Brent
Brent

Reputation: 465

Matching character followed by exactly 1 digit

I need to align formatting of some clinical trial IDs two merge two databases. For example, in database A patient 123 visit 1 is stored as '123v01' and in database B just '123v1'

I can match A to B by grep match those containing 'v0' and strip out the trailing zero to just 'v', but for academic interest & expanding R / regex skills, I want to reverse match B to A by matching only those containing 'v' followed by only 1 digit, so I can then separately pad that digit with a leading zero.

For a reprex:

string <- c("123v1", "123v01", "123v001")

I can match those with >= 2 digits following a 'v', then inverse subset

> idx <- grepl("v(\\d{2})", string)
> string[!idx]
[1] "123v1"

But there must be a way to match 'v' followed by just a single digit only? I have tried the lookarounds

# Negative look ahead "v not followed by 2+ digits"
grepl("v(?!\\d{2})", string)

# Positive look behind "single digit following v"
grepl("(?<=v)\\d{1})", string)

But both return an 'invalid regex' error

Any suggestions?

Upvotes: 3

Views: 736

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626748

You may use

grepl("v\\d(?!\\d)", string, perl=TRUE)

The v\d(?!\d) pattern matches v, 1 digits and then makes sure there is no digit immediately to the right of the current location (i.e. after the v + 1 digit).

See the regex demo.

Note that you need to enable PCRE regex flavor with the perl=TRUE argument.

Upvotes: 1

meenaparam
meenaparam

Reputation: 2019

You need to set the perl=TRUE flag on your grepl function.

e.g.

grepl("v(?!\\d{2})", string, perl=TRUE)
[1]  TRUE FALSE FALSE

See this question for more info.

Upvotes: 3

Related Questions