Reputation: 2669
I am processing strings in R which are supposed to contain zero or one pair of parentheses. If there are nested parentheses I need to delete the inner pair. Here is an example where I need to delete the parentheses around big bent nachos but not the other/outer parentheses.
test <- c(
"Record ID",
"What is the best food? (choice=Nachos)",
"What is the best food? (choice=Tacos (big bent nachos))",
"What is the best food? (choice=Chips with stuff)",
"Complete?"
)
I know I can kill all the parentheses with the stringr
package using str_remove_all()
:
test |>
stringr::str_remove_all(stringr::fixed(")")) |>
stringr::str_remove_all(stringr::fixed("("))
but I don't have the RegEx skills to pick the inner parentheses. I found a SO post that is close but it removes the outer parentheses and I cant untangle it to remove the inner.
Upvotes: 3
Views: 294
Reputation: 18490
Interested in how this would be solved with multiple (
...)
inside the outer parentheses, I came up with the following lookahead based idea. It only checks for an outer closing parentheses though.
test <- gsub("\\(([^)(]*)\\)(?=[^)(]*(?:\\([^)(]*\\)[^)(]*)*\\))", "\\1", test, perl=T)
See this R demo at tio.run or a pattern demo at regex101 (replace with \1
, capture of first group)
The lookahead verifies at each (
...)
if only followed by (
....)
or non-parentheses up to )
.
If there is even arbitrary nesting, flattening the first level could be solved by a recursive regex.
test <- gsub("(?:\\G(?!^)|\\()[^)(]*+\\K(\\(((?>[^)(]+|(?1))*)\\))", "\\2", test, perl=T)
One more R demo at tio.run or a regex101 demo (replace with \2
, the second group's capture)
regex-part | explained |
---|---|
(?:\G(?!^)|\() |
Matches an opening bracket for chaining matches to by use of \G |
[^)(]*+\K |
Consumes any amount of non-parentheses and \K resets the beginning |
(\(((?>[^)(]+|(?1))*)\)) |
Matching the nested parentheses (explanation at php.net ↗). It contains two capture groups: • the first recurses at (?1) • the second captures ( inside) . |
Here the matches are chained to the opening parentheses. There is no check for an outer closing )
.
This \G
based idea can be used without recursion too for just one level but is slightly less efficient.
Upvotes: 2
Reputation: 1039
Here you go.
test |>
stringr::str_replace_all("(\\().*\\(", "\\1") |> # remove inner open brackets
stringr::str_remove_all("\\)(?=.*\\))") # remove inner closed brackets
[1] "Record ID"
[2] "What is the best food? (choice=Nachos)"
[3] "What is the best food? (big bent nachos)"
[4] "What is the best food? (choice=Chips with stuff)"
[5] "Complete?"
Fixed my solution, so as to not lose text:
test |>
stringr::str_replace("\\((.*)\\(", "(\\1") |> # remove inner open brackets
stringr::str_remove_all("\\)(?=.*\\))") # remove inner outer brackets
[1] "Record ID"
[2] "What is the best food? (choice=Nachos)"
[3] "What is the best food? (choice=Tacos big bent nachos)"
[4] "What is the best food? (choice=Chips with stuff)"
[5] "Complete?"
Upvotes: 3
Reputation: 24069
Here is a solution using gsub from base R. It is broken down into 2 steps for readability and debugging.
test <- c(
"Record ID",
"What is the best food? (choice=Nachos)",
"What is the best food? (choice=Tacos (big bent nachos))",
"What is the best food? (choice=Chips with stuff)",
"Complete?"
)
test <- gsub("(\\(.*)\\(", "\\1", test)
# ( \\(.* ) - first group starts with '(' then zero or more characters following that first '('
# \\( - middle part look of a another '('
# "\\1" replace the found group with the part from the first group
test <-gsub("\\)(.*\\))", "\\1", test)
#similer to first part
test
[1] "Record ID"
[2] "What is the best food? (choice=Nachos)"
[3] "What is the best food? (choice=Tacos big bent nachos)"
[4] "What is the best food? (choice=Chips with stuff)"
[5] "Complete?"
Upvotes: 1
Reputation: 520918
Assuming there be at most one nested parentheses, we could use a gsub()
approach:
output <- gsub("\\(\\s*(.*?)\\s*\\(.*?\\)(.*?)\\s*\\)", "(\\1\\2)", test)
output
[1] "Record ID"
[2] "What is the best food? (choice=Nachos)"
[3] "What is the best food? (choice=Tacos)"
[4] "What is the best food? (choice=Chips with stuff)"
[5] "Complete?"
Data:
test <- c(
"Record ID",
"What is the best food? (choice=Nachos)",
"What is the best food? (choice=Tacos (big bent nachos))",
"What is the best food? (choice=Chips with stuff)",
"Complete?"
)
Upvotes: 1