Ricky
Ricky

Reputation: 4686

Unable to replace string with back reference using gsub in R

I am trying to replace some text in a character vector using regex in R where, if there is a set of letters inside a bracket, the bracket content is to erplace the whole thing. So, given the input:

tst <- c("85", "86 (TBA)", "87 (LAST)")

my desired output would be equivalent to c("85", "TBA", "LAST")

I tried gsub("\\(([[:alpha:]])\\)", "\\1", tst) but it didn't replace anything. What do I need to correct in my regular expression here?

Upvotes: 2

Views: 4173

Answers (4)

MrFlick
MrFlick

Reputation: 206253

I think you want

gsub(".*\\(([[:alpha:]]+)\\)", "\\1", tst)
# [1] "85"   "TBA"  "LAST"

Your first expression was trying to match exactly one alpha character rather than one-or-more. I also added the ".*" to capture the beginning part of the string so it gets replaced as well, otherwise, it would be left untouched.

Upvotes: 8

vks
vks

Reputation: 67968

gsub("(?=.*\\([^)]*\\)).*\\(([^)]*)\\)", "\\1", tst, perl=TRUE)
## [1] "85"   "TBA"  "LAST"

You can try this.See demo.Replace by \1.

https://regex101.com/r/sH8aR8/38

Upvotes: 2

Tyler Rinker
Tyler Rinker

Reputation: 109874

I like the purely regex answers better. I'm showing a solution using the qdapRegex package that I maintain as the result is pretty speedy and easy to remember and generalize. It pulls out the strings that are in parenthesis and then replaces any NA (no bracket) with the original value. Note that the result is a list and you'd need to use unlist to match your desired output.

library(qdpRegex)
m <- rm_round(tst, extract=TRUE)
m[is.na(m)] <- tst[is.na(m)]

## [[1]]
## [1] "85"
## 
## [[2]]
## [1] "TBA"
## 
## [[3]]
## [1] "LAST"

Upvotes: 1

Ruthger Righart
Ruthger Righart

Reputation: 4921

The following would work. Note that white-spaces within the brackets may be problematic

A<-sapply(strsplit(tst," "),tail,1)
B<-gsub("\\(|\\)", "", A)

Upvotes: 1

Related Questions