Reputation: 2507
I have a group of variable var
:
> var
[1] "a1" "a2" "a3" "a4"
here is what I want to achieve: using regex and change strings such as this:
3*a1 + a1*a2 + 4*a3*a4 + a1*a3
to
3a1 + a1*a2 + 4a3*a4 + a1*a3
Basically, I want to trim "*" that is not in between any values in var
. Thank you in advance
Upvotes: 4
Views: 212
Reputation: 2507
Thank @alistaire for offering a solution with non-capturing group. However, the solution replies on that there exists an space between the coefficient and "+" in front of it. Here's my modified solution based on his suggestion:
> ss <- "3*a1 + a1*a2+4*a3*a4 +2*a1*a3+ 4*a2*a3"
# my modified version
> gsub('((?:^|\\s|\\+|\\-)\\d)\\*(\\w)', '\\1\\2', ss)
[1] "3a1 + a1*a2+4a3*a4 +2a1*a3+ 4a2*a3"
# alistire's
> gsub('((?:^| )\\d)\\*(\\w)', '\\1\\2', ss)
[1] "3a1 + a1*a2+4*a3*a4 +2*a1*a3+ 4a2*a3"
Upvotes: 0
Reputation:
Can do find (?<![\da-z])(\d+)\*
replace $1
(?<! [\da-z] )
( \d+ ) # (1)
\*
Or, ((?:[^\da-z]|^)\d+)\*
for the assertion impaired engines
( # (1 start)
(?: [^\da-z] | ^ )
\d+
) # (1 end)
\*
Leading assertions are bad anyways.
Regex1: (?<![\da-z])(\d+)\*
Options: < none >
Completed iterations: 100 / 100 ( x 1000 )
Matches found per iteration: 2
Elapsed Time: 1.09 s, 1087.84 ms, 1087844 µs
Regex2: ((?:[^\da-z]|^)\d+)\*
Options: < none >
Completed iterations: 100 / 100 ( x 1000 )
Matches found per iteration: 2
Elapsed Time: 0.77 s, 767.04 ms, 767042 µs
Upvotes: 3
Reputation: 43344
Taking the equation as a string, one option is
gsub('((?:^| )\\d)\\*(\\w)', '\\1\\2', '3*a1 + a1*a2 + 4*a3*a4 + a1*a3')
# [1] "3a1 + a1*a2 + 4a3*a4 + a1*a3"
which looks for
( ... )
(?: ... )
^
|
(or \\s
)\\d
.\\*
,( ... )
\\w
.It replaces the above with
\\1
,\\2
.Adjust as necessary.
Upvotes: 1
Reputation: 626920
You can create a dynamic regex out of the var
to match and capture *
s that are inside your variables, and reinsert them back with a backreference in gsub
, and remove all other asterisks:
var <- c("a1","a2","a3","a4")
s = "3*a1 + a1*a2 + 4*a3*a4 + a1*a3"
block = paste(var, collapse="|")
pat = paste0("\\b((?:", block, ")\\*)(?=\\b(?:", block, ")\\b)|\\*")
gsub(pat, "\\1", s, perl=T)
## "3a1 + a1*a2 + 4a3*a4 + a1*a3"
See the IDEONE demo
Here is the regex:
\b((?:a1|a2|a3|a4)\*)(?=\b(?:a1|a2|a3|a4)\b)|\*
Details:
\b
- leading word boundary((?:a1|a2|a3|a4)\*)
- Group 1 matching
(?:a1|a2|a3|a4)
- either one of your variables\*
- asterisk(?=\b(?:a1|a2|a3|a4)\b)
- a lookahead check that there must be one of your variables (otherwise, no match is returned, the *
is matched with the second branch of the alternation)|
- or\*
- a "wild" literal asterisk to be removed.Upvotes: 2