flgdev
flgdev

Reputation: 487

Regular expression in java, shared symbols

Im trying to remove from string all words shorter than 3 symbols. I have following code

String s = "a abc ab ab ab abc ab";
s = s.replaceAll("(^|\\s)([a-z]{1,2})(\\s|$)", "$1$3");

I run it, but output is

 abc  ab  abc 

I suppose problem exists because 3 words " ab ab ab " share same whitespaces and thats why second "ab" is not entry of regex. How can I make it work properly?

Upvotes: 0

Views: 50

Answers (2)

Joe
Joe

Reputation: 917

Use a word boundary \b instead and delete all words that are too short:

s = s.replaceAll("\\b[a-z]{1,2}\\b", "");

Upvotes: 1

Avinash Raj
Avinash Raj

Reputation: 174786

You may try the below positive lookahead based regex.

string.replaceAll("\\s[a-z]{1,2}(?=\\s|$)|^[a-z]{1,2}\\s", "");

DEMO

  • \\s[a-z]{1,2}(?=\\s|$) matches one or two letter words along with the preceding space character.

  • ^[a-z]{1,2}\\s matches one or two letter words present at the start along with the following space.

  • Replacing the matched chars with empty string will give you the desired output.

Upvotes: 0

Related Questions