Reputation: 115
I had a question about grep
function in R.
I have a string like this:
"160627_NB551043_0004_AHCJCWBGX"
I dont need the whole name. What I need is only 1043
. Its always going to be the last 4 digits in the NB section. Do you know how I can grep that with R
Upvotes: 0
Views: 73
Reputation: 9313
This answer is similar to the one given by Psidom, but I'll consider the following what if:
What if after _NB you don't know how many digits are there? What if there are 7 or more digits?
The approach would be to capture all the digits between _NB
and _
:
NBdigits = sub(".*_NB(\\d+)_.*", "\\1", "160627_NB551043_0004_AHCJCWBGX")
which gives:
[1] "551043"
and then get the last 4 digits by taking modulus 10000:
last4digits = as.numeric(NBdigits)%%10000
Result:
[1] 1043
Edit: Couple more examples
What if there are less than 4 digits after NB?
as.numeric(sub(".*_NB(\\d+)_.*", "\\1", "160627_NB43_0004_AHCJCWBGX"))%%10000
[1] 43
If there are exactly 4?
as.numeric(sub(".*_NB(\\d+)_.*", "\\1", "160627_NB9876_0004_AHCJCWBGX"))%%10000
[1] 9876
More than 6?
as.numeric(sub(".*_NB(\\d+)_.*", "\\1", "160627_NB987654321_0004_AHCJCWBGX"))%%10000
[1] 4321
Since you are stating "I have a string like this", I can't assume that there are exactly 6 digits after NB. The only assumption I am making is that there is at least one digit. This solution will work for any number of digits after NB (not 0 though!).
Upvotes: 0
Reputation: 214927
sub
is more suitable for your case here:
sub(".*NB\\d{2}(\\d{4}).*", "\\1", "160627_NB551043_0004_AHCJCWBGX")
# [1] "1043"
Or you can use str_extract
from stringr
package:
str_extract("160627_NB551043_0004_AHCJCWBGX", "(?<=NB\\d{2})\\d{4}")
# [1] "1043"
(?<=NB\\d{2})\\d{4}
finds out the four digits following the pattern NB\\d{2}
.
Upvotes: 1