user3280500
user3280500

Reputation: 47

Remove Special char from the string

Hi folks i try to remove special char and digits from the string every thing works fine but the char 'h' has been print in prefix and suffix i don't know why it has printed guide me what mistake i have done...

  String str = "<h1>Hi buddy!!you @ $ did a Great job . <h1>";
  String str1 = str.replaceAll("[^\\p{L}\\p{M}]", " ");
  System.out.println(str1);

My Expecting Output

Hi buddy  you did a Great job

But i got

h Hi buddy  you did a Great job h

Upvotes: 1

Views: 148

Answers (5)

Sivakumar M
Sivakumar M

Reputation: 1595

Try this program it will work

public class StringFunction{
public static void main(String args[])
{
    String str = "<h1>Hi buddy!!you @ $ did a Great job . <h1>";
    System.out.println(str.replaceAll("<[^>]+>", "").replaceAll("[^\\p{L}\\p{M}]", " "));

}
}

Upvotes: 0

loknath
loknath

Reputation: 1372

This will delete if more then one continuous space find to one space,remove the tag and remove all the special charcter.

 String str = "<h1>Hi buddy!!you @ $ did a Great job . <h1>";
    //  String str1 = str.replaceAll("[^\\p{L}\\p{M}]", " ");

      String str1 =  str.replaceAll("<[^>]+>", "").replaceAll("[^\\p{L}\\p{M}]", "      
      ").replaceAll("\\s+", " ");

      System.out.println(str1);

output:

 Hi buddy you did a Great job 

Upvotes: 0

Madhan Shanmugam
Madhan Shanmugam

Reputation: 86

Use this code. It will work..

     String str = "<h1>Hi buddy!!you @ $ did a Great job . <h1>";       
     String str1 = str.replaceAll("<[^>]+>", "");
     String str2 = str1.replaceAll("[^\\p{L}\\p{M}]", " ");
     System.out.println(str2);

Upvotes: 0

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726579

The two hs come from the <h1> tags that you have in your input source:

<h1>Hi buddy!!you @ $ did a Great job . <h1>
 ^                                       ^
 |                                       |
 + ------------- Here and here ----------+

If you do not want to see them, find the tags, and remove them before calling replaceAll. A quick way to do it would be applying "<\\p{Alnum}+>" regex in a separate call of replaceAll. It is OK for learning experiments, but is too dirty for production. If you need to do this reliably, get an HTML parser to remove the tags.

Upvotes: 2

devnull
devnull

Reputation: 123508

As mentioned in the comments, you should be using a HTML parser to get rid of the tags before removing everything except the letters and the marks.

Should you insist upon using regex to remove the tags, you could instead say:

String str1 = str.replaceall("<[^>]*>", "").replaceAll("[^\\p{L}\\p{M}]", " ");

i.e. remove the tags before ...

Upvotes: 4

Related Questions