Reputation: 115

Grepping in R for a particular pattern

I had a question about grep function in R. I have a string like this:

"160627_NB551043_0004_AHCJCWBGX"

I dont need the whole name. What I need is only 1043. Its always going to be the last 4 digits in the NB section. Do you know how I can grep that with R

Upvotes: 0

Answers (2)

R. Schifini

Reputation: 9313

This answer is similar to the one given by Psidom, but I'll consider the following what if:
What if after _NB you don't know how many digits are there? What if there are 7 or more digits?

The approach would be to capture all the digits between _NB and _:

NBdigits = sub(".*_NB(\\d+)_.*", "\\1", "160627_NB551043_0004_AHCJCWBGX")

which gives:

[1] "551043"

and then get the last 4 digits by taking modulus 10000:

last4digits = as.numeric(NBdigits)%%10000

Result:

[1] 1043

Edit: Couple more examples

What if there are less than 4 digits after NB?

as.numeric(sub(".*_NB(\\d+)_.*", "\\1", "160627_NB43_0004_AHCJCWBGX"))%%10000
[1] 43

If there are exactly 4?

as.numeric(sub(".*_NB(\\d+)_.*", "\\1", "160627_NB9876_0004_AHCJCWBGX"))%%10000
[1] 9876

More than 6?

as.numeric(sub(".*_NB(\\d+)_.*", "\\1", "160627_NB987654321_0004_AHCJCWBGX"))%%10000
[1] 4321

Since you are stating "I have a string like this", I can't assume that there are exactly 6 digits after NB. The only assumption I am making is that there is at least one digit. This solution will work for any number of digits after NB (not 0 though!).

Upvotes: 0

akuiper

Reputation: 215117

sub is more suitable for your case here:

sub(".*NB\\d{2}(\\d{4}).*", "\\1", "160627_NB551043_0004_AHCJCWBGX")
# [1] "1043"

Or you can use str_extract from stringr package:

str_extract("160627_NB551043_0004_AHCJCWBGX", "(?<=NB\\d{2})\\d{4}")
# [1] "1043"

(?<=NB\\d{2})\\d{4} finds out the four digits following the pattern NB\\d{2}.

Upvotes: 1

Grepping in R for a particular pattern

Answers (2)

Related Questions