Alex Conroy
Alex Conroy

Reputation: 13

Java: How to split and keep delimiter

I can't find the exact way to solve this issue I'm having. I want to split a sentence which will have spaces and can have punctuation marks. I want to keep the words and punctuation marks and store them in a single array.

 Example sentence;
 We have not met, have we?

 Desired array;
{"We", "have", "not", "met", ",", "have", "we", "?"}

I'm trying to split the sentence in a single String split method. I've looked through other related questions on stack overflow and I have't be able to get a regex which caters for me, especially for the question mark.

Upvotes: 1

Views: 2826

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626689

You may try splitting with whitespaces or at the locations before non-word characters:

\s+|(?=\W)

See the regex demo

Pattern details: \s+|(?=\W) contains two alternatives separated with | symbol. \s+ matches 1 or more whitespaces that are removed when splitting. (?=\W) is a positive lookahead that only matches an empty space before the pattern it contains - here, \W matches any non-word character (not a letter, digit, or underscore).

NOTE: If a non-word \W class is too "greedy" for you, you may use a punctuation class, \p{P} (String pattern = "\\s+|(?=\\p{P})") to only split before punctuation.

IDEONE Java demo:

String str = "We have not met, have we?"; 
String[] chunks = str.split("\\s+|(?=\\W)");
System.out.println(Arrays.toString(chunks));
// => [We, have, not, met, ,, have, we, ?]

If you need to tokenize the non-whitespace/non-word chunks as whole units (say, ?!! as one array element), use this matching technique:

Pattern ptrn = Pattern.compile("[^\\s\\W]+|\\S+");
Matcher m = ptrn.matcher("We have not met, have we?!!");
List<String> list = new ArrayList<>();
while (m.find()) {
    list.add(m.group(0));
}
System.out.println(list); // => [We, have, not, met, ,, have, we, ?!!]

See another IDEONE demo and a regex demo.

Upvotes: 2

suulisin
suulisin

Reputation: 1434

String sentence="We have not met, have we ?";
String[] splited = sentence.split("\\s+");

Upvotes: 0

Related Questions