zappee
zappee

Reputation: 22668

Split a string using space when not surrounded by specific characters

I need to split a string using space but keep together the words surrounded by a specific character. The specific characters can be `, * or **.

Let me give an example:

The `String class` represents character strings.
All *string literals* in **Java programs**, such as **abc**

I want to have this result:

The
`String class`
represents
character
strings.
All
*string literals*
in
**Java programs**
,
such
as
**abc**

I am able to write regexp which split my input string to parts if I have only one kind of marker character. But unfortunately, I have multiply markers.

This is the regexp I use in my code: [^\s"]+|"[^"]*("|$). This works fine only with one marker:

String marker = "`";
String data = "The `String class` represents character strings. All *string literals* in **Java programs**, such as **abc**...";

String regexp = "[^\\s" + marker + "]+|" + marker + "[^" + marker + "]*(" + marker +"|$)";
Pattern pattern = Pattern.compile(regexp);
Matcher regexMatcher = pattern.matcher(data);

while (regexMatcher.find()) {
    System.out.println(regexMatcher.group());
}

Output:

The
`String class`
...
*string
literals*
in
**Java
programs**,
such
as
**abc**...

I have tried to stick multiply markers, but the following solution does not work:

String marker = "`|\*"

I can write java code to do this job, but I thought that using regexp can be easier. But I am not sure about it now.

Upvotes: 2

Views: 64

Answers (1)

Ryszard Czech
Ryszard Czech

Reputation: 18611

You may extract them with

`[^`]*`|(\*{1,2}).*?\1|\S+

See proof. This pattern will match strings between backticks, single- or double asterisks, and any non-whitespace chunks.

Use double backslash in Java code:

String regex = "`[^`]*`|(\\*{1,2}).*?\\1|\\S+";

Upvotes: 1

Related Questions