Peter111
Peter111

Reputation: 823

Java Regex, Split String at punctuation marks except in brackets

I have this String: "Hello, my Name is [[Peter.java]]."

The desired split is: [Hello, my, Name, is, [[Peter.java]]]

I split at punktuation marks but completly ignore things in these brackets.

I tried:

string.split("(?!\\[\\[.*\\]\\])\\s*(\\,|\\.|\\s)\\s*")

but this doesnt work because the output is [Hello, my, Name, is, [[Peter, java]]]. Can you help me?

Other examples:

"Hello. My name is [[Peter.java]]" --> [Hello, My, name, is, [[Peter.java]]]

"Hi. How, [[are,you]]" --> [Hi, How, [[are,you]]]

Upvotes: 2

Views: 972

Answers (2)

Federico Piazza
Federico Piazza

Reputation: 30985

You can use this regex to split:

[.,\s]+(?!\w+])

Working demo

enter image description here

The code:

public void testRegex() {
    String str = "Hello. my Name is [[Peter.java]].";

    String[] arr = str.split("[.,\\s]+(?!\\w+])");

    System.out.println(Arrays.toString(arr));
}
// Output: [Hello, my, Name, is, [[Peter.java]]]

Edit: as HamZa pointed in his comment, the regex above fails is the string is something, like this]. So, to leverage the usage of SKIP & FAIL pcre feature, this regex can be improved by using:

\[\[.*?\]\]     # Match our brackets
(*SKIP)(*FAIL)  # Skip that match and proceed further
|               # or
[\s.,]+         # any character of: whitespace (\n, \r, \t,
                         \f, and " "), '.', ',' (1 or more times)

Working demo

Upvotes: 1

Strikeskids
Strikeskids

Reputation: 4052

Instead of using String.split, you'll probably want to use a different sort of regex.

/\[\[(.*?)\]\]|(\w+)\W/g

Online demo

Then use a matcher to iterate through the matches.

Upvotes: 1

Related Questions