user3833308
user3833308

Reputation: 1212

Java split string by whitespace and punctuation but include only punctuation in result

hello-world how are you?

should result in

hello
-
world
how
are 
you
?

This is the code I tried

String str = "Hello-world how are you?";
Arrays.stream(str.split("\\b+")).forEach(System.out::println);

Upvotes: 0

Views: 1146

Answers (4)

anubhava
anubhava

Reputation: 785481

You can use this regex for splitting:

String str = "hello-world how are you?";
Arrays.stream(str.split("\\p{javaWhitespace}+|(?=\\p{P})|(?<=\\p{P})")).forEach(System.err::println);

Here \\p{Z}+|(?=\\p{P})|(?<=\\p{P}) splits on any unicode whitespace or with the help of lookaheads it asserts if previous or next character is a punctuation character.

RegEx Demo

Output:

hello
-
world
how
are
you
?

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627087

A much simpler regex solution is possible with a matching approach:

String str = "Hello-world how are yóu?";
List<String> res = new ArrayList<>();
Matcher m = Pattern.compile("(?U)\\w+|\\p{Punct}").matcher(str);
while (m.find()) {
    res.add(m.group());
}
System.out.println(res);
// => [Hello, -, world, how, are, yóu, ?]

See the Java demo

Details:

  • (?U) - a Pattern.UNICODE_CHARACTER_CLASS modifier (so that \w could match Unicode letters)
  • \\w+ - 1+ word chars (letters, digits, or _ - that can be subtracted by using [\\w&&[^_]] or [^\\W_])
  • | - or
  • \\p{Punct} - a punctuation symbol (may be replaced with [\\p{P}\\p{S}]).

Upvotes: 1

String str = "Hello-world how are you?";
Arrays.stream(str.split("\\b+")).forEach(w -> {
    if (!w.equals(" "))
        System.out.println(w);
});

Upvotes: 1

Pr3ds
Pr3ds

Reputation: 358

Use the split, this broke in separator.

public static void main(String[] args) {
        String test = "hello - word bla bla bla";
        String[] values = test.split(" ");

        for (String element : values) {
            System.out.println(element);
        }

    }

Upvotes: -1

Related Questions