alberto
alberto

Reputation: 153

Replacing only certain characters inside brackets (R)

I'm finding a bit difficult to write a regex expression that converts a string of the type:

[1] "[hola;adios] address1;[hola;adios] address2"

into:

[1] "[hola|adios] address1;[hola|adios] address2"

that is, replacing the semicolons inside the brackets into vertical bars. The attempts I've made either fail to replace only the semicolons inside the brackets (the ones outside are also replaced), or they replace the entire substring [hola;adios] for a vertical bar.

I'd be very grateful if someone could give me some pointers as to how to accomplish this task using the R language

Upvotes: 2

Views: 141

Answers (2)

hwnd
hwnd

Reputation: 70722

Using the gsubfn package, you could avoid having to use lookarounds.

x <- '[hola;adios] address1;[hola;adios] address2'
gsubfn('\\[[^]]*]', ~ gsub(';', '|', x), x)
# [1] "[hola|adios] address1;[hola|adios] address2"

Upvotes: 1

Avinash Raj
Avinash Raj

Reputation: 174696

You could try the below gsub commands.

> x <- '[hola;adios] address1;[hola;adios] address2'
> gsub(";(?=[^\\[\\]]*\\])", "|", x, perl=T)
[1] "[hola|adios] address1;[hola|adios] address2"

;(?=[^\\[\\]]*\\]) matches all the semicolons only if it's followed by ,

  • [^\[\]]* any character but not [ or ], zero or more times.
  • \] And a closing square bracket. So this would match all the semicolons which are present inside the [], square brackets. (?=...) called positive lookahead assertion.

DEMO

OR

> gsub(";(?![^\\[\\]]*\\[)", "|", x, perl=T)
[1] "[hola|adios] address1;[hola|adios] address2"

(?!...) called negative lookahead which does the opposite of positive lookahead assertion.

Upvotes: 3

Related Questions