Reputation: 131
I am attempting to use gsub to remove characters from the following string:
string <- "function(data, x = !!rlang::sym(\"Time1\"), y = !!rlang::sym(\"YVAR\")), values = c(\"a\", \"b\"))"
The new string should return:
cat(string)
function(data, x = Time1, y = YVAR, values = c("a", "b"))
that is to say, I'd like to remove !!rlang::sym(\"
, keep Time1
, and remove the closing quote and parenthesis after Time1 \")
(and I'd also like to remove this for YVAR
)
Time1
and YVAR
(x and y variable names) are arbitrary and can be named anything in the resulting string, however, the characters !!rlang::sym(\"
and the closing quote and parenthesis \")
after the arbitrary string that needs to be kept are constant and will not change.
I understand I can simply use
result <- gsub("!!rlang::sym(\"", "", string, fixed = TRUE)
then
result <- gsub("\")", "", result, fixed = TRUE)
to get part of the way there, however, I'd like to find a more elegant regex solution that can combine both of these gsub and also will of course not remove the closing "\")"
in values = c(\"a\", \"b\"))"
Upvotes: 2
Views: 1359
Reputation: 18641
Use
result <- gsub("!!rlang::sym\\(\"([\\w\\W]*?)\"\\)", "\\1", string, perl=TRUE)
See proof
Expanation
--------------------------------------------------------------------------------
!!rlang::sym '!!rlang::sym'
--------------------------------------------------------------------------------
\( '('
--------------------------------------------------------------------------------
\" '"'
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[\w\W]*? any character of: word characters (a-z,
A-Z, 0-9, _), non-word characters (all
but a-z, A-Z, 0-9, _) (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
\) ')'
See R proof:
string <- "function(data, x = !!rlang::sym(\"Time1\"), y = !!rlang::sym(\"YVAR\")), values = c(\"a\", \"b\"))"
result <- gsub("!!rlang::sym\\(\"([\\w\\W]*?)\"\\)", "\\1", string, perl=TRUE)
cat(result)
Results: function(data, x = Time1, y = YVAR), values = c("a", "b"))
Upvotes: 1
Reputation: 161110
If it's always the literal !!rlang::sym("
, then this
cat( gsub('!!rlang::sym\\("(\\S+)"\\)', "\\1", string), "\n" )
# function(data, x = Time1, y = YVAR), values = c("a", "b"))
If it's a function-call/paren/quote, then it can be generalized a little. I'd think you'd want some specificity, since otherwise you'll be parsing out a lot more than you want. I'll assuming that rlang
is required:
gsub('\\S+rlang\\S+\\("(\\S+)"\\)', "\\1", string)
Note that there are two right-parens in your sample string
, !!rlang::sym(\"YVAR\"))
, which are thwarting the pattern just a little. If that's real, then ... either look for repeats with "\\)+
or ... something else.
Upvotes: 2
Reputation: 163632
You could use a single pattern with a capture group which will match any character except "
, and use group 1 in the replacement.
!!rlang::sym\("([^"]+)"\)
string <- "function(data, x = !!rlang::sym(\"Time1\"), y = !!rlang::sym(\"YVAR\")), values = c(\"a\", \"b\"))"
cat(gsub('!!rlang::sym\\("([^"]+)"\\)', "\\1", string))
Output
function(data, x = Time1, y = YVAR), values = c("a", "b"))
Upvotes: 1