Reputation: 21
Can anyone help me with a regular expression to replace all the single letters with spaces. Example:
input: "this is a t f with u f array"
output: "this is with array".
my regular expression is replaceAll("(\\s+[a-z]\\s+)"," ");
But its works as follows:
input: "this is a t f with u f array"
output: "this is t f with f array".
Upvotes: 2
Views: 2195
Reputation: 114420
You can try word boundaries:
"this is a t f with u f array".replaceAll("\\b[a-z]\\b"," ")
Upvotes: 2
Reputation: 715
String a = "this is a t f with u f array";
a = a.replaceAll("(\s\p{Alpha}(?=\s))+((?=\s)\s)", " ");
Zero width positive lookahead followed by a match of the trailing space in a capture group produces what you're looking for:
this is with array
Upvotes: 0
Reputation: 8222
The problem occurs because of the way replaceAll works. What happens is after each time it replaces a section it starts looking after the section it matched, for example when your pattern runs you get the result
this is t with f array
What is happening internally is:
What you need use is a trick called "zero-width positive lookahead" If you use the pattern:
(\\s+[a-z](?=\\s))
The second space says "try to match, but don't actually count it as part of the match". So when the next match occurs it will be able to use that space as part of its match.
You will also need to replace with the empty string, since the trailing space is not removed i.e.
"this is a t f with u f array".replaceAll("(\\s+[a-z](?=\\s))","")
Upvotes: 6
Reputation: 76898
replaceAll("\\b[a-z]\\b", " ");
will output
this is with array
The problem is in how the replaceAll approaches things. \\s[a-z]\\s
matches
" a "
then moves on to
"t f with u f array"
which causes it to miss the first t
Upvotes: 0
Reputation: 40168
You could use word boundary:-
String s = "this is a t f with u f array";
s = s.replaceAll("\\b\\w\\b\\s+", "");
System.out.println(s); // this is with array
Upvotes: 0
Reputation: 14149
Hm... maybe because when the " a " is found and replaced in "... a t f ..", the matcher looks at the following character, wich is 't' (the space is already consumed). But then again I'd expect the output to be "this is t with f array".
Try using replaceAll("((\s+[a-z])*\s+)"," ")
instead. But it has the (unwanted?) side effect that any length of whitespace will be reduced to a single space.
Upvotes: 0