itsMeInMiami
itsMeInMiami

Reputation: 2669

How can I remove inner parentheses from an R string?

I am processing strings in R which are supposed to contain zero or one pair of parentheses. If there are nested parentheses I need to delete the inner pair. Here is an example where I need to delete the parentheses around big bent nachos but not the other/outer parentheses.

test <- c(
  "Record ID", 
  "What is the best food? (choice=Nachos)", 
  "What is the best food? (choice=Tacos (big bent nachos))", 
  "What is the best food? (choice=Chips with stuff)", 
  "Complete?"
) 

I know I can kill all the parentheses with the stringr package using str_remove_all():

test |>
  stringr::str_remove_all(stringr::fixed(")")) |> 
  stringr::str_remove_all(stringr::fixed("("))

but I don't have the RegEx skills to pick the inner parentheses. I found a SO post that is close but it removes the outer parentheses and I cant untangle it to remove the inner.

Upvotes: 3

Views: 294

Answers (4)

bobble bubble
bobble bubble

Reputation: 18490

Interested in how this would be solved with multiple (...) inside the outer parentheses, I came up with the following lookahead based idea. It only checks for an outer closing parentheses though.

test <- gsub("\\(([^)(]*)\\)(?=[^)(]*(?:\\([^)(]*\\)[^)(]*)*\\))", "\\1", test, perl=T)

See this R demo at tio.run or a pattern demo at regex101 (replace with \1, capture of first group)

The lookahead verifies at each (...) if only followed by (....) or non-parentheses up to ).


If there is even arbitrary nesting, flattening the first level could be solved by a recursive regex.

test <- gsub("(?:\\G(?!^)|\\()[^)(]*+\\K(\\(((?>[^)(]+|(?1))*)\\))", "\\2", test, perl=T)

One more R demo at tio.run or a regex101 demo (replace with \2, the second group's capture)

regex-part explained
(?:\G(?!^)|\() Matches an opening bracket for chaining matches to by use of \G
[^)(]*+\K Consumes any amount of non-parentheses and \K resets the beginning
(\(((?>[^)(]+|(?1))*)\)) Matching the nested parentheses (explanation at php.net ↗).
It contains two capture groups:
• the first recurses at (?1)
• the second captures (inside).

Here the matches are chained to the opening parentheses. There is no check for an outer closing ). This \G based idea can be used without recursion too for just one level but is slightly less efficient.

Upvotes: 2

Josh White
Josh White

Reputation: 1039

Here you go.

test |>
  stringr::str_replace_all("(\\().*\\(", "\\1") |> # remove inner open brackets
  stringr::str_remove_all("\\)(?=.*\\))") # remove inner closed brackets
[1] "Record ID"                                       
[2] "What is the best food? (choice=Nachos)"          
[3] "What is the best food? (big bent nachos)"        
[4] "What is the best food? (choice=Chips with stuff)"
[5] "Complete?"

EDIT

Fixed my solution, so as to not lose text:

test |>
  stringr::str_replace("\\((.*)\\(", "(\\1") |> # remove inner open brackets
  stringr::str_remove_all("\\)(?=.*\\))") # remove inner outer brackets
[1] "Record ID"                                            
[2] "What is the best food? (choice=Nachos)"               
[3] "What is the best food? (choice=Tacos big bent nachos)"
[4] "What is the best food? (choice=Chips with stuff)"     
[5] "Complete?" 

Upvotes: 3

Dave2e
Dave2e

Reputation: 24069

Here is a solution using gsub from base R. It is broken down into 2 steps for readability and debugging.

test <- c(
   "Record ID", 
   "What is the best food? (choice=Nachos)", 
   "What is the best food? (choice=Tacos (big bent nachos))", 
   "What is the best food? (choice=Chips with stuff)", 
   "Complete?"
) 

test <- gsub("(\\(.*)\\(", "\\1", test)
# ( \\(.*  ) - first group starts with '(' then zero or more characters following that first '('
#  \\(       - middle part look of a another '('

#  "\\1" replace the found group with the part from the first group

test <-gsub("\\)(.*\\))", "\\1", test)
#similer to first part
test

[1] "Record ID"                                            
[2] "What is the best food? (choice=Nachos)"               
[3] "What is the best food? (choice=Tacos big bent nachos)"
[4] "What is the best food? (choice=Chips with stuff)"     
[5] "Complete?"  

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520918

Assuming there be at most one nested parentheses, we could use a gsub() approach:

output <- gsub("\\(\\s*(.*?)\\s*\\(.*?\\)(.*?)\\s*\\)", "(\\1\\2)", test)
output

[1] "Record ID"                                       
[2] "What is the best food? (choice=Nachos)"          
[3] "What is the best food? (choice=Tacos)"           
[4] "What is the best food? (choice=Chips with stuff)"
[5] "Complete?"

Data:

test <- c(
  "Record ID", 
  "What is the best food? (choice=Nachos)", 
  "What is the best food? (choice=Tacos (big bent nachos))", 
  "What is the best food? (choice=Chips with stuff)", 
  "Complete?"
)

Upvotes: 1

Related Questions