jwcb1025
jwcb1025

Reputation: 131

How can I use gsub to remove specific characters before and after an arbitrary character in string

I am attempting to use gsub to remove characters from the following string:

string <- "function(data, x = !!rlang::sym(\"Time1\"), y = !!rlang::sym(\"YVAR\")), values = c(\"a\", \"b\"))"

The new string should return:

cat(string)
function(data, x = Time1, y = YVAR, values = c("a", "b"))

that is to say, I'd like to remove !!rlang::sym(\", keep Time1, and remove the closing quote and parenthesis after Time1 \") (and I'd also like to remove this for YVAR)

Time1 and YVAR (x and y variable names) are arbitrary and can be named anything in the resulting string, however, the characters !!rlang::sym(\" and the closing quote and parenthesis \")after the arbitrary string that needs to be kept are constant and will not change.

I understand I can simply use

result <- gsub("!!rlang::sym(\"", "", string, fixed = TRUE)

then

 result <- gsub("\")", "", result, fixed = TRUE)

to get part of the way there, however, I'd like to find a more elegant regex solution that can combine both of these gsub and also will of course not remove the closing "\")" in values = c(\"a\", \"b\"))"

Upvotes: 2

Views: 1359

Answers (3)

Ryszard Czech
Ryszard Czech

Reputation: 18641

Use

result <- gsub("!!rlang::sym\\(\"([\\w\\W]*?)\"\\)", "\\1", string, perl=TRUE)

See proof

Expanation

--------------------------------------------------------------------------------
  !!rlang::sym             '!!rlang::sym'
--------------------------------------------------------------------------------
  \(                       '('
--------------------------------------------------------------------------------
  \"                       '"'
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [\w\W]*?                 any character of: word characters (a-z,
                             A-Z, 0-9, _), non-word characters (all
                             but a-z, A-Z, 0-9, _) (0 or more times
                             (matching the least amount possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  "                        '"'
--------------------------------------------------------------------------------
  \)                       ')'

See R proof:

string <- "function(data, x = !!rlang::sym(\"Time1\"), y = !!rlang::sym(\"YVAR\")), values = c(\"a\", \"b\"))"
result <- gsub("!!rlang::sym\\(\"([\\w\\W]*?)\"\\)", "\\1", string, perl=TRUE)
cat(result)

Results: function(data, x = Time1, y = YVAR), values = c("a", "b"))

Upvotes: 1

r2evans
r2evans

Reputation: 161110

If it's always the literal !!rlang::sym(", then this

cat( gsub('!!rlang::sym\\("(\\S+)"\\)', "\\1", string), "\n" )
# function(data, x = Time1, y = YVAR), values = c("a", "b")) 

If it's a function-call/paren/quote, then it can be generalized a little. I'd think you'd want some specificity, since otherwise you'll be parsing out a lot more than you want. I'll assuming that rlang is required:

gsub('\\S+rlang\\S+\\("(\\S+)"\\)', "\\1", string)

Note that there are two right-parens in your sample string, !!rlang::sym(\"YVAR\")), which are thwarting the pattern just a little. If that's real, then ... either look for repeats with "\\)+ or ... something else.

Upvotes: 2

The fourth bird
The fourth bird

Reputation: 163632

You could use a single pattern with a capture group which will match any character except ", and use group 1 in the replacement.

!!rlang::sym\("([^"]+)"\)

Regex demo

string <- "function(data, x = !!rlang::sym(\"Time1\"), y = !!rlang::sym(\"YVAR\")), values = c(\"a\", \"b\"))"
cat(gsub('!!rlang::sym\\("([^"]+)"\\)', "\\1", string))

Output

function(data, x = Time1, y = YVAR), values = c("a", "b"))

R demo

Upvotes: 1

Related Questions