Reputation: 47
Hi folks i try to remove special char and digits from the string every thing works fine but the char 'h' has been print in prefix and suffix i don't know why it has printed guide me what mistake i have done...
String str = "<h1>Hi buddy!!you @ $ did a Great job . <h1>";
String str1 = str.replaceAll("[^\\p{L}\\p{M}]", " ");
System.out.println(str1);
My Expecting Output
Hi buddy you did a Great job
But i got
h Hi buddy you did a Great job h
Upvotes: 1
Views: 148
Reputation: 1595
Try this program it will work
public class StringFunction{
public static void main(String args[])
{
String str = "<h1>Hi buddy!!you @ $ did a Great job . <h1>";
System.out.println(str.replaceAll("<[^>]+>", "").replaceAll("[^\\p{L}\\p{M}]", " "));
}
}
Upvotes: 0
Reputation: 1372
This will delete if more then one continuous space find to one space,remove the tag and remove all the special charcter.
String str = "<h1>Hi buddy!!you @ $ did a Great job . <h1>";
// String str1 = str.replaceAll("[^\\p{L}\\p{M}]", " ");
String str1 = str.replaceAll("<[^>]+>", "").replaceAll("[^\\p{L}\\p{M}]", "
").replaceAll("\\s+", " ");
System.out.println(str1);
output:
Hi buddy you did a Great job
Upvotes: 0
Reputation: 86
Use this code. It will work..
String str = "<h1>Hi buddy!!you @ $ did a Great job . <h1>";
String str1 = str.replaceAll("<[^>]+>", "");
String str2 = str1.replaceAll("[^\\p{L}\\p{M}]", " ");
System.out.println(str2);
Upvotes: 0
Reputation: 726579
The two h
s come from the <h1>
tags that you have in your input source:
<h1>Hi buddy!!you @ $ did a Great job . <h1>
^ ^
| |
+ ------------- Here and here ----------+
If you do not want to see them, find the tags, and remove them before calling replaceAll
. A quick way to do it would be applying "<\\p{Alnum}+>"
regex in a separate call of replaceAll
. It is OK for learning experiments, but is too dirty for production. If you need to do this reliably, get an HTML parser to remove the tags.
Upvotes: 2
Reputation: 123508
As mentioned in the comments, you should be using a HTML parser to get rid of the tags before removing everything except the letters and the marks.
Should you insist upon using regex to remove the tags, you could instead say:
String str1 = str.replaceall("<[^>]*>", "").replaceAll("[^\\p{L}\\p{M}]", " ");
i.e. remove the tags before ...
Upvotes: 4