TuD
TuD

Reputation: 47

Regex to split string by character and retaining content inside square brackets

Hello I have a string as a linux command like this

x <- "cd/etc/init[BKSP][BKSP]it.d[ENTER]"

I want to split the string by character and keep the content inside of the square bracket intact. Basically I want to retain the command press. The result would look some thing like this if using str_split:

c("c","d","/","e","t","c","i",......"BKSP","BKSP","i","t",".", "d", "ENTER")

Can someone help me with this problem? I've been playing with regex and haven't figured out how to achieve this.

I tried /.*?[^[A-Z*?]/ but I didn’t do the trick. I’m also trying to add the delimiter to the matching group to split the string, too.

Upvotes: 1

Views: 277

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626699

Match and capture all substrings inside brackets and also capture any other character inside a branch reset group. Then, remove the outer square bracekts in the found matches:

> x <- c("cd/etc/init[BKSP][BKSP]it.d[ENTER]", "abc]", "[abc")
> matches <- regmatches(x, gregexpr("(?s)(?|\\[([^][]*)]|(.))", x, perl=TRUE))
> sapply(matches, sub, pattern="\\[(.*)\\]", replacement="\\1")
[[1]]
 [1] "c"     "d"     "/"     "e"     "t"     "c"     "/"     "i"     "n"     "i"     "t"     "BKSP"  "BKSP"  "i"     "t"     "."     "d"     "ENTER"

[[2]]
[1] "a" "b" "c" "]"

[[3]]
[1] "[" "a" "b" "c"

See the regex demo. Details:

  • (?s) - a DOTALL modifier to let . match line break chars, too
  • (?|\[([^][]*)]|(.)) - a branch reset group where capturing group IDs in different alternation branches have idetnical IDs:
    • \[([^][]*)] - a [ followed with any 0+ chars other than [ and ] captured into Group 1 and then ]
    • | - or
    • (.) - any char (again, Group 1).

Upvotes: 2

stellaria
stellaria

Reputation: 65

Here is my rather long-winded solution to your problem (I adapted approach from this post: split string with regex)

x <- "[a] + [bc] + 1"

foo <- function(x){
  #Mark square brackets with commas
  x <- gsub("\\[",",[",x)
  x <- gsub("\\]","],",x)

  #Separate the string based on commas
  x <- unlist(strsplit(x,","))

  #Find which vector elements contain brackets
  ind <- grepl("\\[.*\\]", x)

  y <- character()
  for (a in seq_along(x)){
    if (nchar(x[a])!=0){
      if (ind[a]){
        y <- c(y, x[a]) #Store original character
      } else {
        y <- c(y, unlist(strsplit(x[a], ""))) #Store split character
      }
    }
  }

  #Remove the brackets
  y <- gsub("\\[", "", y)
  y <- gsub("\\]", "", y)
  return(y)
}

Upvotes: 0

Related Questions