paras2682
paras2682

Reputation: 541

removing single letter words using java pattern matching

I want to eliminate all single letter words from a string in Java using pattern matching. I've coded as follows:

    String str = "P@";

    //remove single char words and extra white spaces
    inputStr = inputStr.replaceAll("\\b[\\w']{1}\\b", "").replaceAll("\\s+", " ").trim();

I'm expecting an output as P@ as the input is not a single letter word. But I'm getting output as @ because its eliminating P. So basically its considering only alphabetical characters for matching pattern. Whereas I want to match on the basis of length of the string entered.

Please help.

Upvotes: 4

Views: 2236

Answers (4)

Lpc_dark
Lpc_dark

Reputation: 2952

The test case is:

asd df R# $R $$ $ 435 4ee 4 hey buddy this is a test i@ wanted

"[!-~]?\\b[A-z]\\b[!-~]?"
"[!-~]?\\b[\\w]\\b[!-~]?"

the output for above code is:

asd df $$ $ 435 4ee 4 hey buddy this is test wanted
asd df $$ $ 435 4ee hey buddy this is test wanted

notice that in the second one the four is missing. The second regex gets rid of numbers didn't know if a single number counted or not

Upvotes: 0

Mikkel Løkke
Mikkel Løkke

Reputation: 3749

Try this regex:

\s([^\s]{1})\s

Should catch single character non-whitespace, delimited by a whitespace on either side. If you need to accept non-whitespace characters like ',' and '.' as delimiters you will need to add those.

Upvotes: 0

Ankur Shanbhag
Ankur Shanbhag

Reputation: 7804

Try using this :

        String data = "asd df R# $R $$ $ 435 4ee 4";

    String replaceAll = data.replaceAll("(\\s.\\s)|(\\s.$)", " ");
    System.out.println(replaceAll);

Output is : asd df R# $R $$ 435 4ee

Upvotes: 2

Meherzad
Meherzad

Reputation: 8563

Use this

str = str.replaceAll("(^.$|\\s.\\s|^.\\s|\\s.$)", "").replaceAll("\\s+", " ").trim();

The problem with your solution was that you were using \b which was expecting a character at the end and start of word so it was not working in your case.

/b

Matches at the position between a word character (anything matched by \w) and a non-word character (anything matched by [^\w] or \W) as well as at the start and/or end of the string if the first and/or last characters in the string are word characters.

REFER FOR REGULAR EXPRESSION

Upvotes: 0

Related Questions