Reputation: 64
i have this code:
String s=" //wont won't won't ";
String[] w = s.split("[\\s+\\/,\\.!_\\-?;:]++");
i don't the ' to be removed from won't as it is part of the word. help would be appreciated but //wont i do want // to be removed.
so my question is the following- how do I utilize regex in java to get a certain punctuation not to be removed if its part of a word like "won't" where we have ' , but at the same time keep this-
"[\\s+\\/,\\.!_\\-?;:]++"
Upvotes: 2
Views: 71
Reputation: 627077
You can use
String[] w = s.split("[\\s+/,.!_\\-?;:]+|\\B'|'\\B");
See the regex demo. Details:
[\s+/,.!_\-?;:]+
- one or more whitespaces, +
, /
, ,
, .
, !
, _
, -
, ?
, ;
or :
|
- or\B'
- '
that is at the start of string or immediately preceded with a non-word char|
- or'\B
- '
that is at the end of string or immediately followed with a non-word char.See the Java demo:
String s =" //wont won't won't ";
String[] w = s.split("[\\s+/,.!_\\-?;:]+|\\B'|'\\B");
System.out.println(Arrays.toString(w));
// => [, wont, won't, won't]
You may get rid of the empty entries at the start if you remove all matches at the start of the string first:
String regex = "[\\s+/,.!_\\-?;:]+|\\B'|'\\B";
String[] w2 = s.replaceFirst("^(?:"+regex+")+", "").split(regex);
System.out.println(Arrays.toString(w2));
// => [wont, won't, won't]
Upvotes: 1