D.Shefer
D.Shefer

Reputation: 173

java remove a pattern from string using regex

I need to clear my string from the following substrings:

\n

\uXXXX (X being a digit or a character)

e.g. "OR\n\nThe Central Site Engineering\u2019s \u201cfrontend\u201d, where developers turn to"

-> "OR The Central Site Engineering frontend , where developers turn to"
I tried using the String method replaceAll but dnt know how to overcome the \uXXXX issue as well as it didnt work for the \n

String s = "\\n";  
data=data.replaceAll(s," ");

how does this regex looks in java?

thanks for the help

Upvotes: 10

Views: 53876

Answers (2)

Pshemo
Pshemo

Reputation: 124215

Problem with string.replaceAll("\\n", " "); is that replaceAll expects regular expression, and \ in regex is special character used for instance to create character classes like \d which represents digits, or to escape regex special characters like +.

So if you want to match \ in Javas regex you need to escape it twice:

  • once in regex \\
  • and once in String "\\\\".

like replaceAll("\\\\n"," ").

You can also let regex engine do escaping for you and use replace method like

replace("\\n"," ")

Now to remove \uXXXX we can use

replaceAll("\\\\u[0-9a-fA-F]{4}","")


Also remember that Strings are immutable, so each str.replace.. call doesn't affect str value, but it creates new String. So if you want to store that new string in str you will need to use

str = str.replace(..)

So your solution can look like

String text = "\"OR\\n\\nThe Central Site Engineering\\u2019s \\u201cfrontend\\u201d, where developers turn to\"";

text = text.replaceAll("(\\\\n)+"," ")
           .replaceAll("\\\\u[0-9A-Ha-h]{4}", "");

Upvotes: 14

Roel Strolenberg
Roel Strolenberg

Reputation: 2950

Best to do this in 2 parts I guess:

String ex = "OR\n\nThe Central Site Engineering\u2019s \u201cfrontend\u201d, where developers turn to";
String part1 = ex.replaceAll("\\\\n"," "); // The firs \\ replaces the backslah, \n replaces the n.
String part2 = part1.replaceAll("u\\d\\d\\d\\d","");
System.out.println(part2);

Try it =)

Upvotes: 0

Related Questions