mona
mona

Reputation: 6279

How to handle non-ASCII Characters in Java while using PDPageContentStream/PDDocument

I am using PDFBox to create PDF from my web application. The web application is built in Java and uses JSF. It takes the content from a web based form and puts the contents into a PDF document.

Example: A user fill up an inputTextArea (JSF tag) in the form and that is converted to a PDF. I am unable to handle non-ASCII Characters.

How should I handle the non-ASCII characters or atleast strip them out before putting it on the PDF. Please help me with any suggestions or point me any resources. Thanks!

Upvotes: 3

Views: 2050

Answers (1)

BalusC
BalusC

Reputation: 1108712

Since you're using JSF on JSP instead of Facelets (which is implicitly already using UTF-8), do the following steps to avoid the platform default charset being used (which is often ISO-8859-1, which is the wrong choice for handling of the majority of "non-ASCII" characters):

  1. Add the following line to top of all JSPs:

    <%@ page pageEncoding="UTF-8" %>
    

    This sets the response encoding to UTF-8 and sets the charset of the HTTP response content type header to UTF-8. The last will instruct the client (webbrowser) to display and submit the page with the form using UTF-8.

  2. Create a Filter which does the following in doFilter() method:

    request.setCharacterEncoding("UTF-8");
    

    Map this on the FacesServlet like follows:

    <filter-mapping>
        <filter-name>nameOfYourCharacterEncodingFilter</filter-name>
        <servlet-name>nameOfYourFacesServlet</servlet-name>
    </filter-mapping>
    

    This sets the request encoding of all JSF POST requests to UTF-8.

This should fix the Unicode problem in the JSF side. I have never used PDFBox, but since it's under the covers using iText which in turn should already be supporting Unicode/UTF-8, I think that part is fine. Let me know if it still doesn't after doing the above fixes.

See also:

Upvotes: 2

Related Questions