Reputation: 47
Hello I have a string as a linux command like this
x <- "cd/etc/init[BKSP][BKSP]it.d[ENTER]"
I want to split the string by character and keep the content inside of the square bracket intact. Basically I want to retain the command press. The result would look some thing like this if using str_split
:
c("c","d","/","e","t","c","i",......"BKSP","BKSP","i","t",".", "d", "ENTER")
Can someone help me with this problem? I've been playing with regex and haven't figured out how to achieve this.
I tried /.*?[^[A-Z*?]/
but I didn’t do the trick. I’m also trying to add the delimiter to the matching group to split the string, too.
Upvotes: 1
Views: 277
Reputation: 626699
Match and capture all substrings inside brackets and also capture any other character inside a branch reset group. Then, remove the outer square bracekts in the found matches:
> x <- c("cd/etc/init[BKSP][BKSP]it.d[ENTER]", "abc]", "[abc")
> matches <- regmatches(x, gregexpr("(?s)(?|\\[([^][]*)]|(.))", x, perl=TRUE))
> sapply(matches, sub, pattern="\\[(.*)\\]", replacement="\\1")
[[1]]
[1] "c" "d" "/" "e" "t" "c" "/" "i" "n" "i" "t" "BKSP" "BKSP" "i" "t" "." "d" "ENTER"
[[2]]
[1] "a" "b" "c" "]"
[[3]]
[1] "[" "a" "b" "c"
See the regex demo. Details:
(?s)
- a DOTALL modifier to let .
match line break chars, too(?|\[([^][]*)]|(.))
- a branch reset group where capturing group IDs in different alternation branches have idetnical IDs:
\[([^][]*)]
- a [
followed with any 0+ chars other than [
and ]
captured into Group 1 and then ]
|
- or(.)
- any char (again, Group 1).Upvotes: 2
Reputation: 65
Here is my rather long-winded solution to your problem (I adapted approach from this post: split string with regex)
x <- "[a] + [bc] + 1"
foo <- function(x){
#Mark square brackets with commas
x <- gsub("\\[",",[",x)
x <- gsub("\\]","],",x)
#Separate the string based on commas
x <- unlist(strsplit(x,","))
#Find which vector elements contain brackets
ind <- grepl("\\[.*\\]", x)
y <- character()
for (a in seq_along(x)){
if (nchar(x[a])!=0){
if (ind[a]){
y <- c(y, x[a]) #Store original character
} else {
y <- c(y, unlist(strsplit(x[a], ""))) #Store split character
}
}
}
#Remove the brackets
y <- gsub("\\[", "", y)
y <- gsub("\\]", "", y)
return(y)
}
Upvotes: 0