Reputation: 13
I can't find the exact way to solve this issue I'm having. I want to split a sentence which will have spaces and can have punctuation marks. I want to keep the words and punctuation marks and store them in a single array.
Example sentence;
We have not met, have we?
Desired array;
{"We", "have", "not", "met", ",", "have", "we", "?"}
I'm trying to split the sentence in a single String split method. I've looked through other related questions on stack overflow and I have't be able to get a regex which caters for me, especially for the question mark.
Upvotes: 1
Views: 2826
Reputation: 626689
You may try splitting with whitespaces or at the locations before non-word characters:
\s+|(?=\W)
See the regex demo
Pattern details: \s+|(?=\W)
contains two alternatives separated with |
symbol. \s+
matches 1 or more whitespaces that are removed when splitting. (?=\W)
is a positive lookahead that only matches an empty space before the pattern it contains - here, \W
matches any non-word character (not a letter, digit, or underscore).
NOTE: If a non-word \W
class is too "greedy" for you, you may use a punctuation class, \p{P}
(String pattern = "\\s+|(?=\\p{P})"
) to only split before punctuation.
String str = "We have not met, have we?";
String[] chunks = str.split("\\s+|(?=\\W)");
System.out.println(Arrays.toString(chunks));
// => [We, have, not, met, ,, have, we, ?]
If you need to tokenize the non-whitespace/non-word chunks as whole units (say, ?!!
as one array element), use this matching technique:
Pattern ptrn = Pattern.compile("[^\\s\\W]+|\\S+");
Matcher m = ptrn.matcher("We have not met, have we?!!");
List<String> list = new ArrayList<>();
while (m.find()) {
list.add(m.group(0));
}
System.out.println(list); // => [We, have, not, met, ,, have, we, ?!!]
See another IDEONE demo and a regex demo.
Upvotes: 2
Reputation: 1434
String sentence="We have not met, have we ?";
String[] splited = sentence.split("\\s+");
Upvotes: 0