user2928282
user2928282

Reputation: 1

Regex to remove special characters(Latin 1)

I have trouble in removing following special chacters:

input:

Curiosity Finds “Surprising†Amounts of Water, Perchlorate On Mars

desired output:

Curiosity Finds "Surprising" Amounts of Water, Perchlorate On Mars

just need to convert “ into ".

Thanks in Advance Rohit

Upvotes: 0

Views: 1637

Answers (2)

sdanzig
sdanzig

Reputation: 4500

Here's the exact thing you're asking for:

import java.util.*;
import java.lang.*;
import java.io.*;

class Ideone
{
    public static void main (String[] args) throws java.lang.Exception
    {
        String str = "Curiosity Finds “Surprising†Amounts of Water, Perchlorate On Mars";
        str = str.replaceAll("\\u00E2\\u20AC\\u0153?", "\"");
        System.out.println("str="+str);
    }
}

Output:

str=Curiosity Finds "Surprising" Amounts of Water, Perchlorate On Mars

You can try it out here: http://ideone.com/WHCXUj

And for future reference, a handy online unicode character lookup is here: http://unicodelookup.com

Here's how I used it, for instance: http://unicodelookup.com/#“/1

Upvotes: 0

anubhava
anubhava

Reputation: 785196

One way to achieve that is to remove all non-ASCII letter like this:

str = str.replaceAll("[^\\u0000-\\u007f]+", "");

Upvotes: 1

Related Questions