Reputation: 603

content from html missing in pdf created by iTextrenderer

I am trying to create pdf from one html which has chinese char. in this i have got weird prob. the line from html which has chinese char is not completely shown in pdf generated from it.

Below is my html:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1?DTD/transitional.dtd">
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>some title.</title>

<style type="text/css">
     .name
   {
         font-family: "Arial Unicode MS";
         color:red;
         margin-left: 5px;
         margin-right: 5px
     }
</style>
</head>
<body>
 <b class="name">

LLTRN,DEBIT,,,6841,FXW,,CNY,PAY,C,,,,DD,,ord par nm,,,,,,,CN,百威英博雪津(三明)啤酒有限公司,,,,,,,CN,20140617,,CNY,647438.24,OUR,,,,,,,,SHANGHAI,CN,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

    <br>

RDF,FTX,TEXT
<br>
</b>
<br>
</body></html>

below is my itext renderer code:

StringWriter writer = new StringWriter();
Tidy tidy = new Tidy();
tidy.setTidyMark(false);
tidy.setDocType("omit");
tidy.setXHTML(true);
tidy.setInputEncoding("utf-8");
tidy.setOutputEncoding("utf-8");
//tidy.parse(new StringReader(documentJsoup.toString()), writer);
tidy.parse(new StringReader(inputFileString), writer);
writer.close();
String  pdfContent = writer.toString();

// Creating an instance of iText renderer which will be used to generate the pdf from the html document.
ITextRenderer renderer = new ITextRenderer();           

/*renderer.setDocument(doc, baseurl);
renderer.layout();
renderer.createPDF(os);
os.flush();         

// close all the streams
//fis.close();
//os.close();
//instream.close();
 */
ITextFontResolver resolver = renderer.getFontResolver();

//renderer.getFontResolver().addFont("C:\\Windows\\Fonts\\arialuni.ttf", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
resolver.addFont("C:\\Windows\\Fonts\\arialuni.ttf", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
renderer.setDocumentFromString(pdfContent);
renderer.layout();
renderer.createPDF(os);

since i used font resolver and add font, chinese char are shown.... but pdf shows missing content.... last characters of that line (thats :"AI" from "shanghai" and next ",CN,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,") is not visible.... its seen something like:

html2pdf: content missing

i tried a lot to see whats wrong but couldnt find solution. can anybody from u help me in resolving this issue pls ?? TIA!

Upvotes: 1

Answers (3)

user1650779

Reputation: 1

I tried adding below css rules into the body class and it worked perfectly.

word-wrap: break-word; word-break: break-all;

"Adding whitespaces" works sometimes (I tried adding spaces after symbols like 。 or 、), but sometimes when there's no symbols it still overflows.

Upvotes: 0

user3748331

Reputation:

Here you need to add font type or font file in your application.

you can find code here itextSharp - html to pdf some turkish characters are missing

this question is also same as your question..

if this helps you then please give points.

Upvotes: 0

obourgain

Reputation: 9356

The issue is that Flying-saucer doesn't manage line wrapping in chinese text. It only insert line break on whitespaces. In your case, it means it cannot insert a line break after "nm,,,,", and it doesn't fit on the line.

It is a known bug in flying saucer (see here), but it's unlikely to be fixed soon.

The only workaround is to insert a whitespace anywhere in your string after the Chinese characters. It will make all the text visible.

Upvotes: 1

content from html missing in pdf created by iTextrenderer

Answers (3)

Related Questions