Replacing only certain characters inside brackets (R)

Question

I'm finding a bit difficult to write a regex expression that converts a string of the type:

[1] "[hola;adios] address1;[hola;adios] address2"

into:

[1] "[hola|adios] address1;[hola|adios] address2"

that is, replacing the semicolons inside the brackets into vertical bars. The attempts I've made either fail to replace only the semicolons inside the brackets (the ones outside are also replaced), or they replace the entire substring [hola;adios] for a vertical bar.

I'd be very grateful if someone could give me some pointers as to how to accomplish this task using the R language

Avinash Raj · Accepted Answer

You could try the below gsub commands.

> x <- '[hola;adios] address1;[hola;adios] address2'
> gsub(";(?=[^]*\])", "|", x, perl=T)
[1] "[hola|adios] address1;[hola|adios] address2"

;(?=[^]*\]) matches all the semicolons only if it's followed by ,

[^]* any character but not [ or ], zero or more times.
\] And a closing square bracket. So this would match all the semicolons which are present inside the [], square brackets. (?=...) called positive lookahead assertion.

DEMO

OR

> gsub(";(?![^]*\[)", "|", x, perl=T)
[1] "[hola|adios] address1;[hola|adios] address2"

(?!...) called negative lookahead which does the opposite of positive lookahead assertion.

Replacing only certain characters inside brackets (R)

Answers (2)

Related Questions