Reputation: 21347
I have a question similar to How to split a string, but also keep the delimiters?. How would I split a String using a regex, keeping some types of delimiters, but not others? Specifically, I want to keep the non-whitespace delimiters, but not the whitespace delimiters.
To make this concrete:
"a;b c" | ["a", ";", "b", "c"]
"a; ; bb c ;d" | ["a", ";", ";", "bb", "c", ";", "d"]
Can this be done cleanly with a regex, and if so how?
Right now I'm working around this by splitting on the character to keep, and then again on the other one. I can stick with this approach if the regex cannot do so, or cannot do so cleanly:
Arrays.stream(input.split("((?<=;)|(?=;))"))
.flatMap(s -> Arrays.stream(s.split("\\s+")))
.filter(s -> !s.isEmpty())
.toArray(String[]::new); // In practice, I would generally use .collect(Collectors.toList()) instead
Upvotes: 2
Views: 891
Reputation: 89639
You can do it this way:
System.out.println(String.join("-", "a; ; b c ;d".split("(?!\\G) *(?=;)|(?<=;) *| +")));
details:
(?!\\G) # not contiguous to a previous match and not at the start of the string
[ ]* # optional spaces
(?=;) # followed by a ;
| # OR
(?<=;) # preceded by a ;
[ ]* # optional spaces
| # OR
[ ]+ # several spaces
Feel free to change the literal space to \\s
. To avoid an empty item (at the beginning of the resulting array when the string starts with a whitespace), you need to trim the string first.
Obviously, without the constraint of splitting, @alphabravo way is the most simple.
Upvotes: 2
Reputation: 48751
Borrowing @CasimiretHippolyte \G
trick you may want to split on
\\s+|(?!\\G)()
Note: no delimiters are specified.
Based on avoiding split on very first spaces:
(?m)(?<!^|\\s)(\\s+|)(?!$)
Upvotes: 0
Reputation:
After realizing Java doesn't support adding captured split char's to the
split array elements, thought I'd try a split solution without that
capability.
Basically there are only 4 permutations involving whitespace and the colon.
Finally, there is just the whitespace.
Here is the regex.
Raw: \s+(?=;)|(?<=;)\s+|(?<!\s)(?=;)|(?<=;)(?!\s)|\s+
Stringed: "\\s+(?=;)|(?<=;)\\s+|(?<!\\s)(?=;)|(?<=;)(?!\\s)|\\s+"
And the expanded regex with permutation's explained.
Good luck!
\s+ # Required, suck up wsp before ;
(?= ; ) # ;
| # or,
(?<= ; ) # ;
\s+ # Required, suck up wsp after ;
| # or,
(?<! \s ) # No wsp before ;
(?= ; ) # ;
| # or,
(?<= ; ) # ;
(?! \s ) # No wsp after ;
| # or,
\s+ # Required wsp
Edit
To stop a split on whitespace at BOS, use this regex.
Raw: \s+(?=;)|(?<=;)\s+|(?<!\s)(?=;)|(?<=;)(?!\s)|(?<!^)(?<!\s)\s+
Stringed: "\\s+(?=;)|(?<=;)\\s+|(?<!\\s)(?=;)|(?<=;)(?!\\s)|(?<!^)(?<!\\s)\\s+"
Explained:
\s+ # Required, suck up wsp before ;
(?= ; ) # ;
| # or,
(?<= ; ) # ;
\s+ # Required, suck up wsp after ;
| # or,
(?<! \s ) # No wsp before ;
(?= ; ) # ;
| # or,
(?<= ; ) # ;
(?! \s ) # No wsp after ;
| # or,
(?<! ^ ) # No split of wsp at BOS
(?<! \s )
\s+ # Required wsp
Upvotes: 1
Reputation: 7948
I suggest to capture what you want instead of splitting using this simple pattern
([^; ]+|;)
Upvotes: 3
Reputation: 1246
I found a regex that works:
(\\s+)|((?<=;)(?=\\S)|(?<=\\S)(?=;))
public static void main(String argss[]){
System.out.println(Arrays.toString("a; ; b c ;d"
.split("(\\s+)|((?<=;)(?=\\S)|(?<=\\S)(?=;))")));
}
Will print out:
[a, ;, ;, b, c, ;, d]
Upvotes: 2
Reputation: 425358
You want to split on whitespace, or between a letter and a non letter:
str.split("\\s+|(?<=\\w)(?=\\W)|(?<=\\W)(?=\\w)");
Upvotes: 1