Reputation: 584
This may be a very simple question but I have not much experience with regex expressions. This page is a good source of regex expressions but could not figure out how to include them into my following code:
data %>% filter(grepl("^A01H1", icl))
Question
I would like to extract the values in one column of my data
frame starting with this A01H1
up to 2 more digits, for example A01H100, A01H140, A01H110
. I could not find a solution despite my few attempts:
Attempts
I looked at this question from which I used ^A01H1[0-9].{2}
to select up tot two more digits.
I tried with adding any character ^A01H1[0-9][0-9][x-y]
to stop after two digits.
Any help would be much appreciated :)
Upvotes: 0
Views: 62
Reputation: 626748
It looks as if you want to match a part of a string that starts with A01H1
, then contains 1 or 2 digits and then is not followed with any digit.
You may use
^A01H1\d{1,2}(?!\d)
See the regex demo. If there can be no text after two digits at all, replace (?!\d)
with $
.
Details
^
- start of strinmgA01H1
- literal string\d{1,2}
- one to two digits(?!\d)
- no digit allowed immediately to the right$
- end of stringIn R, you could use it like
grepl("^A01H1\\d{1,2}(?!\\d)", icl, perl=TRUE)
Or, with the string end anchor,
grepl("^A01H1\\d{1,2}$", icl)
Note the perl=TRUE
is only necessary when using PCRE specific syntax like (?!\d)
, a negative lookahead.
Upvotes: 1
Reputation: 7724
You can use "^A01H1\\d{1,2}$"
.
The first part ("^A01H1"
), you figured out yourself, so what are we doing in the second part ("\\d{1,2}$"
)?
\d
includes all digits and is equivalent to [0-9]
, since we are working in R you need to escape \
and thus we use \\d
{1,2}
indicates we want to have 1 or 2 matches of \\d
$
specifies the end of the string, so nothing should come afterwards and this prevents to match more than 2 digitsUpvotes: 1