Reputation: 57
I have an HTML and wanted to convert it into in memory pdf but cannot find good library to convert HTML to PDF.
I have tried this using ITextRenderer
and Jsoup
but throwing exception Can't load the XML resource (using TRaX transformer). org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 3; The markup in the document preceding the root element must be well-formed.
Here's my code
Document document = Jsoup.parse(template, "UTF-8");
document.outputSettings().syntax(Document.OutputSettings.Syntax.html);
ByteArrayOutputStream binaryOutput = new ByteArrayOutputStream();
renderer.setDocumentFromString(document.html());
renderer.layout();
renderer.createPDF(binaryOutput);
Upvotes: 1
Views: 13660
Reputation: 142
Popular tool to do a HTML to PDF conversion is IronPDF for Java (also for .NET).
With the addition of the following to pom.xml
(changing the version to latest):
<dependencies>
<dependency>
<groupId>com.ironsoftware</groupId>
<artifactId>ironpdf</artifactId>
<version>2022.11.0</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>2.0.3</version>
</dependency>
</dependencies>
I was able to render pixel-perfect PDFs that looked exactly like my HTML. An example is:
import com.ironsoftware.ironpdf.*;
// Render the HTML as a PDF. Stored in myPdf as type PdfDocument;
PdfDocument myPdf = PdfDocument.renderHtmlAsPdf("<h1> ~Hello World~ </h1> Made with IronPDF!");
// Save the PdfDocument to a file
myPdf.saveAs(Paths.get("html_saved.pdf"));
// Or with a local file:
myPdf = PdfDocument.renderHtmlFileAsPdf("example.html");
myPdf.saveAs(Paths.get("html_file_saved.pdf"));
// Even works with Webpages:
myPdf = PdfDocument.renderUrlAsPdf("https://ironpdf.com");
myPdf.saveAs(Paths.get("url.pdf"));
Disclaimer that I am affiliated with IronPDF and will be more than happy to answer any questions you may have with the software.
Upvotes: 1
Reputation: 51
You can try to use this package: com.itextpdf.html2pdf.HtmlConverter
With this, all you have to do is:
HtmlConverter.convertToPdf(tempFileHtml, tempFilePdf);
And export it. It doesn't have a lot of problems with bad-formed xmls/htmls. I used it and I am happy with the results obtained :)
Upvotes: 2
Reputation: 9473
You are searching for a way to render HTML and store that as PDF. In this question people tried to render XML (which is close to HTML and definitely is XHTML) to get it ultimately into PDF: Java Render XML Document as PDF
But coming to your error message: That error is related to your input document which you did not show. The document preceeding the root element should/could look like this:
<?xml version="1.0"?>
<!-- comment -->
<?processinginstruction whatever parameters?>
<rootElement/>
So everything before <rootElement/>
is what your error message is pointing to. I guess you are looking at an HTML document, and it may contain something that the JSoup HTML parser is struggling with. Unless you share that document with us you will have to figure it out yourself.
Upvotes: 2