prabu
prabu

Reputation: 1277

How to convert em dash in java

I have a problem, where when the end user submits the data from HTML form in a web application, they are copying the data from Word document which contains long dash or em dash.

As per the logic we are trying to read those data from database and writing it to an excel file.

As an outcome those characters are generated in the excel as shown below, which contains a kind of question mark.

  Actual output : 1993 � 1995
Expected output : 1993 – 1995 

I have done the UTF-8 encoding in Java but still getting the same output in the excel. How to solve this?

Below is the extract of my code.

try {
        keyStrenghts = new String(keyStrenghts.getBytes("utf-8"));
        } catch (UnsupportedEncodingException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }

I am using JDK 6 and apache poi to generate the excel file.

Upvotes: 0

Views: 5493

Answers (2)

George Ninan
George Ninan

Reputation: 2097

Unicode for � is: \uFFFD

keyStrenghts = "1993 � 1995";
if(keyStrenghts.contains("\uFFFD")){
   keyStrenghts = keyStrenghts.replace("\uFFFD","-");
}

Now if you print keyStrenghts you will get: 1993 – 1995

Upvotes: 0

hack_on
hack_on

Reputation: 2520

This might solve your problem if it is limited to em dashes:

keyStrenghts = keyStrenghts.replaceAll("\\p{Pd}", "-");

This is using a regex to replace all the dashes with ascii "-" as stated here.

Upvotes: 1

Related Questions