Reputation: 89
currently this is what i came up with, but i feel like there has to be some other better way, with only base R or dplyr
I want to keep the numbers between -
and remove anything else.
Expected output would be 10 here.
c = "2020-10-13"
a = gsub("^.*?-","",c)
a = gsub("-\\d*","", a)
Upvotes: 1
Views: 642
Reputation: 21908
If all the other elements are in the same format this could help, however it may not be the best solution for different formats. It's just simple:
library(stringr)
c = "2020-10-13"
str_sub(c, 6, -4L)
[1] "10"
Just in parallel with another good answer you got, you can also extract the month
you can use lubridate
package functions:
library(lubridate)
month(ymd(c))
[1] 10
Upvotes: 0
Reputation: 21400
We can use str_extract
and lookaround:
library(stringr)
str_extract(c, "(?<=-)\\d+(?=-)")
Here, (?<=-)
and (?=-)
are the lookbehind and, respectively, the lookahead; they make sure that only those one or more digits (\\d+
) are extracted that are preceded and followed by a -
.
Upvotes: 0
Reputation: 11128
Using regmatches
and regexpr
from base R, Assuming here this is the consistent format in your data:
string <- "2020-10-13"
g <- gregexpr('(?<=-)\\d+(?=-)', string, perl=TRUE)
regmatches(string, g)
Also in case if its a date, you can try(without using regex):
format(as.Date(string), "%m")
Upvotes: 2
Reputation: 101149
Try gsub
like this
> gsub(".*?-(\\d+).*", "\\1", "2020-10-13")
[1] "10"
Upvotes: 2