Reputation: 129
I need to exclude from text all symbols, except letters, spaces and standalone apostrophes (like: " ' " or "this ' is"), but leave apostrophes as is if they are part of the word (like: "word'" or "that's" or "'word").
I tried String .replaceAll("[^a-z'\\s]","")
method, and it seems I need to add there something like [^([a-z]*'[a-z]+|[a-z]+'[a-z]*)]
, but I can't make complete expression and it seems second part of expression is not valid.
Thanks for help!
Upvotes: 1
Views: 53
Reputation: 626754
You can use
s.replaceAll("[^a-zA-Z\\s']|(?<!\\S)'(?!\\S)","")
See the regex demo. Details:
[^a-zA-Z\s']
- any char but an ASCII letter, whitespace and single quotation mark|
- or(?<!\S)'(?!\S)
- a '
not preceded nor followed with a non-whitespace char.See a Java demo:
String s = " ' this ' is word' that's 'word";
System.out.println(s.replaceAll("[^a-zA-Z\\s']|(?<!\\S)'(?!\\S)",""));
// => this is word' that's 'word
Upvotes: 2