ko4evneg
ko4evneg

Reputation: 129

Excluding standalone apostrophes from the text

I need to exclude from text all symbols, except letters, spaces and standalone apostrophes (like: " ' " or "this ' is"), but leave apostrophes as is if they are part of the word (like: "word'" or "that's" or "'word"). I tried String .replaceAll("[^a-z'\\s]","") method, and it seems I need to add there something like [^([a-z]*'[a-z]+|[a-z]+'[a-z]*)], but I can't make complete expression and it seems second part of expression is not valid.

Thanks for help!

Upvotes: 1

Views: 53

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626754

You can use

s.replaceAll("[^a-zA-Z\\s']|(?<!\\S)'(?!\\S)","")

See the regex demo. Details:

  • [^a-zA-Z\s'] - any char but an ASCII letter, whitespace and single quotation mark
  • | - or
  • (?<!\S)'(?!\S) - a ' not preceded nor followed with a non-whitespace char.

See a Java demo:

String s = " ' this ' is word' that's 'word";
System.out.println(s.replaceAll("[^a-zA-Z\\s']|(?<!\\S)'(?!\\S)",""));
// =>   this  is word' that's 'word

Upvotes: 2

Related Questions