Sabeerdeen Avk
Sabeerdeen Avk

Reputation: 11

How to read an excel which has arabic columns

while reading excel sheet, arabic columns are displaying as ???? remaining english columns are displaying fine. i guess utf-8 issue i don't know where i miss something. please do happy help

     FileInputStream fis = new FileInputStream(fileName);
    Workbook workbook = new XSSFWorkbook(fis);

System.out.println("Current Encoding " +
                    "::" + System.getProperty("file.encoding"));

even after changing below given am getting Current Encoding :: Cp1252

netbeans 8.0.2

-J-Dfile.encoding=UTF-8 added in netbeans_default_options

jsp (struts 1.3)

  <%@page pageEncoding="UTF-8"%>

     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

     <html:form action="/uploadApplicantAction"  method="post" acceptCharset="utf-8" 
    enctype="multipart/form-data">

tomcat 8

uncommented in web.xml

<filter>
    <filter-name>setCharacterEncodingFilter</filter-name>
    <filter-class>org.apache.catalina.filters.SetCharacterEncodingFilter</filter-class>
    <init-param>
        <param-name>encoding</param-name>
        <param-value>UTF-8</param-value>
    </init-param>
    <async-supported>true</async-supported>
</filter>

<filter-mapping>
    <filter-name>setCharacterEncodingFilter</filter-name>
    <url-pattern>/*</url-pattern>
</filter-mapping>

added utf-8 code in tomcat server.xml

<Connector port="8080" protocol="HTTP/1.1" URIEncoding="UTF-8"
               connectionTimeout="20000"
               redirectPort="8443" />

Upvotes: 1

Views: 853

Answers (1)

skomisa
skomisa

Reputation: 17343

Since only your Arabic text is being rendered as backward question marks it seems probable that:

  • There is nothing wrong with the way you are reading the files. I don't think that your suggestion that this may be a "utf-8 issue" is likely since the English text is being rendered correctly. If there was an encoding/decoding issue you would probably see replacement characters in the output.
  • The most likely cause of your problem is that the font you are using for the output doesn't support Arabic.

To verify this, it is trivial to create a simple Java application that renders some Arabic text to the console:

package arabicdemo;

public class ArabicDemo {

    public static void main(String[] args) {
        // Use a font which supports Arabic, such as DejaVu Sans, Courier New or MS Arial Unicode.
        // - To set font in edit window: Tools > Options > Fonts & Colors > Syntax tab > Font
        // - To set font in Ouput window: Tools > Options > Miscellaneous > Output tab > Font 
        System.out.println("مرحبا بالعالم"); // "Hello world" in Arabic
    }

}

Just be sure to use the appropriate font(s), as described in the comment for the code sample (since you are using NetBeans). Here is a screen shot of that application being run in NetBeans, with the edit window font set to Deja Vu Sans and the Output window font set to Courier New:

ArabicTextInNetBeans

Once you have that trivial application displaying Arabic text correctly in the edit and Output windows in NetBeans, modify your application to use the same font(s).

After doing that, your application's Arabic text should render correctly when processing Excel files. If not, then at least you have eliminated the font as a potential cause of the problem, so update your question as appropriate.

Notes

  1. You may not want/need to modify the font in the edit window. I just mentioned it for completeness.
  2. You should not be setting -Dfile.encoding=UTF-8. From a Java bug report in 2005 :

    The "file.encoding" property is not required by the J2SE platform specification; it's an internal detail of Sun's implementations and should not be examined or modified by user code. It's also intended to be read-only; it's technically impossible to support the setting of this property to arbitrary values on the command line or at any other time during program execution.

    The preferred way to change the default encoding used by the VM and the runtime system is to change the locale of the underlying platform before starting your Java program.

  3. From the code and configuration details you provided in the question, "UTF-8" is being set in six different places. Once you have the application working, it might be worth taking the time to progressively remove them, to learn which of those settings are essential, and which don't matter.

Upvotes: 3

Related Questions