Matifou
Matifou

Reputation: 8880

R: replace interior square brackets in presence of multiple brackets

I am trying to convert an expression such as [[a], [b]] into list(c(a), c(b)) (basically a java dictionary into R list). As a first step, I would like to convert each inner expression [a] into an equivalent c(a). According to How to replace square brackets with curly brackets using R's regex?, I can use a nice regular expression "\\[(.*?)\\]" or also \\[([^]]*)\\].

This will work when there is only one [] parenthesis, but not multiple ones like [[ as it will capture the first, resulting in "c([a), c(b])" instead of "[c(a), c(b)]". How can I make sure I am only matching the inner parenthesis in a call that contains multiple [[], []]?

vec <- c("[a]", "[[a], [b]]")
gsub("\\[(.*?)\\]", "c(\\1)", vec)
#> [1] "c(a)"         "c([a), c(b])"
gsub("\\[([^]]*)\\]", "c(\\1)", vec)
#> [1] "c(a)"         "c([a), c(b)]"

Created on 2021-02-15 by the reprex package (v0.3.0)

Upvotes: 0

Views: 130

Answers (1)

r2evans
r2evans

Reputation: 160407

While Remove any text inside square brackets in r suggests how to deal with the regex itself, it doesn't address the "nested" component of the problem.

You can run it multiple times until there are no more changes.

vec <- c("[a]", "[[a], [b]]")
(vec2 <- gsub("\\[([^][]*)\\]", "c(\\1)", vec))
# [1] "c(a)"         "[c(a), c(b)]"
(vec3 <- gsub("\\[([^][]*)\\]", "c(\\1)", vec2))
# [1] "c(a)"          "c(c(a), c(b))"

The change is to disallow both opening [ and closing ] brackets in the regex, which should only match the inner-most (no brackets).

It should be feasible to nest this in a while loop that exits as soon as no change is detected.

Upvotes: 2

Related Questions