Chris
Chris

Reputation: 1311

Remove special characters java

Hi I'm trying to figure out a way to remove the tags from the results returned from the Google Feed API. Their result is

   Breaking \u003cb\u003eNews\u003c/b\u003e Updates

How can we remove these characters? I'm not sure if RegEx would be better (or worse). Does anyone have an idea on how to remove these? Google does not supply an option to remove tags from the results in Java.

Upvotes: 2

Views: 2389

Answers (4)

Dunes
Dunes

Reputation: 40683

This is HTML. \u003cb\u003e translates to <b>.

You'll want to use an HTML parser as HTML is not fully parse-able by a regular expression.

With a library like Jsoup you could do this as.

String data = Jsoup.parse(html).body().text();

This will get you "Breaking News Updates".

Upvotes: 0

Rohit Jain
Rohit Jain

Reputation: 213213

You can use the below regex..

String str = "Breaking \u003cb\u003eNews\u003c/b\u003e Updates";
str = str.replaceAll("\\<(.*)?\\>(.*)\\</\\1\\>", "$2");
System.out.println(str);

OUTPUT: -

Breaking News Updates
  • \\<(.*)?\\> matches the first opening tag - <b>
  • \\</\\1\\> matches the corresponding closing tag - </b>
  • \\1 is used to backreference what was the tag, so that correct pair of tags are matched..

So, <b>news <update></b> -> In this case <update> will not be removed..

Upvotes: 0

nicholas_jordan
nicholas_jordan

Reputation: 31

I pull those routinely with

String.replaceAll("\\p{Cntrl}","")

Upvotes: 1

Vinay Bedre
Vinay Bedre

Reputation: 466

The best solution would be to use JSON to convert the data.

JSON.parse(JSON.stringify({a : '<put your string here>'}));

It will be proper as the data you will get from Google API will be in the form of JSON.

Upvotes: 0

Related Questions