Amleto
Amleto

Reputation: 584

Extract up to two more digits

This may be a very simple question but I have not much experience with regex expressions. This page is a good source of regex expressions but could not figure out how to include them into my following code:

data %>% filter(grepl("^A01H1", icl))

Question

I would like to extract the values in one column of my data frame starting with this A01H1 up to 2 more digits, for example A01H100, A01H140, A01H110. I could not find a solution despite my few attempts:

Attempts

I looked at this question from which I used ^A01H1[0-9].{2} to select up tot two more digits.

I tried with adding any character ^A01H1[0-9][0-9][x-y] to stop after two digits.

Any help would be much appreciated :)

Upvotes: 0

Views: 62

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626748

It looks as if you want to match a part of a string that starts with A01H1, then contains 1 or 2 digits and then is not followed with any digit.

You may use

^A01H1\d{1,2}(?!\d)

See the regex demo. If there can be no text after two digits at all, replace (?!\d) with $.

Details

  • ^ - start of strinmg
  • A01H1 - literal string
  • \d{1,2} - one to two digits
  • (?!\d) - no digit allowed immediately to the right
  • $ - end of string

In R, you could use it like

grepl("^A01H1\\d{1,2}(?!\\d)", icl, perl=TRUE)

Or, with the string end anchor,

grepl("^A01H1\\d{1,2}$", icl)

Note the perl=TRUE is only necessary when using PCRE specific syntax like (?!\d), a negative lookahead.

Upvotes: 1

kath
kath

Reputation: 7724

You can use "^A01H1\\d{1,2}$". The first part ("^A01H1"), you figured out yourself, so what are we doing in the second part ("\\d{1,2}$")?

  • \d includes all digits and is equivalent to [0-9], since we are working in R you need to escape \ and thus we use \\d
  • {1,2} indicates we want to have 1 or 2 matches of \\d
  • $ specifies the end of the string, so nothing should come afterwards and this prevents to match more than 2 digits

Upvotes: 1

Related Questions