Reputation: 65
I have the following string:
x <- "(((K05708+K05709+K05710+K00529) K05711),K05712) K05713 K05714 K02554"
# [1] "(((K05708+K05709+K05710+K00529) K05711),K05712) K05713 K05714 K02554"
and I want to split it by space delimiter avoiding what's inside the parentheses in order to have something like:
[[1]]
[1] "(((K05708+K05709+K05710 K00529) K05711),K05712)"
[2] "K05713" "K05714"
[4] "K02554"
See that two spaces remain inside the first parentheses level.
I read the following answers but I couldn't make it work in my case: r split on delimiter not in parentheses and Using strsplit() in R, ignoring anything in parentheses
Thanks in advance!
Upvotes: 3
Views: 1404
Reputation: 626699
I think you need a regex matching the balanced parentheses and then skipping them, and then matching the whitespaces that remain with the following PCRE-based regex:
(\((?:[^()]++|(?1))*\))(*SKIP)(*F)|\s
See the regex demo (replace the space with \s
above for better visibility).
Pattern details:
(\((?:[^()]++|(?1))*\))(*SKIP)(*F)
- Group 1 matching
\((?:[^()]++|(?1))*\)
- a substring presenting a balanced parentheses substring: \(
matches a (
, (?:[^()]++|(?1))*
matches zero or more (*
) sequences of 1+ chars other than (
and )
(see [^()]++
) or the whole pattern of this whole Group 1 (see the subrouting call (?1)
), then \)
matches a literal )
and (*SKIP)(*F)
make the regex discard the whole matched text while keeping the regex index at the end of that match, and proceed looking for the next match|
- or
- a space to split againstHere is an online R demo:
s <- "(((K05708+K05709+K05710+K00529) K05711),K05712) K05713 K05714 K02554"
strsplit(s, "(\\((?:[^()]++|(?1))*\\))(*SKIP)(*F)| ", perl=TRUE)
Output:
[[1]]
[1] "(((K05708+K05709+K05710+K00529) K05711),K05712)"
[2] "K05713"
[3] "K05714"
[4] "K02554"
Upvotes: 3