YYY
YYY

Reputation: 89

How to keep part of a char and remove anything else in R using gsub?

currently this is what i came up with, but i feel like there has to be some other better way, with only base R or dplyr I want to keep the numbers between - and remove anything else. Expected output would be 10 here.

c = "2020-10-13"

a = gsub("^.*?-","",c)

a = gsub("-\\d*","", a)

Upvotes: 1

Views: 642

Answers (4)

Anoushiravan R
Anoushiravan R

Reputation: 21908

If all the other elements are in the same format this could help, however it may not be the best solution for different formats. It's just simple:

library(stringr)

c = "2020-10-13"

str_sub(c, 6, -4L)

[1] "10"

Just in parallel with another good answer you got, you can also extract the month you can use lubridate package functions:

library(lubridate)

month(ymd(c))

[1] 10

Upvotes: 0

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21400

We can use str_extract and lookaround:

library(stringr)
str_extract(c, "(?<=-)\\d+(?=-)")

Here, (?<=-) and (?=-) are the lookbehind and, respectively, the lookahead; they make sure that only those one or more digits (\\d+) are extracted that are preceded and followed by a -.

Upvotes: 0

PKumar
PKumar

Reputation: 11128

Using regmatches and regexpr from base R, Assuming here this is the consistent format in your data:

string <- "2020-10-13"
g <- gregexpr('(?<=-)\\d+(?=-)', string, perl=TRUE)
regmatches(string, g)

Also in case if its a date, you can try(without using regex):

format(as.Date(string), "%m")

Upvotes: 2

ThomasIsCoding
ThomasIsCoding

Reputation: 101149

Try gsub like this

> gsub(".*?-(\\d+).*", "\\1", "2020-10-13")
[1] "10"

Upvotes: 2

Related Questions