Meetra
Meetra

Reputation: 1

Converting Docx file to PDF and sending to ServletResponse outputStream

try {
            response.setHeader("Content-Disposition", "inline; filename=\"" + URLEncoder.encode(file.getFileName(), StandardCharsets.UTF_8) + ".pdf\"");
        } catch (Exception e) {
            log.error("Happened error when setting headers Content-Disposition: ", e);
        }
        response.setCharacterEncoding("UTF-8");
        response.setContentType("application/pdf");
        try (InputStream inputStream = minioClient.getObject(GetObjectArgs.builder()
                .bucket(bucketName)
                .object(file.getS3Id())
                .build())) {

            if (file.getFileExtension() == FileExtension.PDF) {
                IOUtils.copy(inputStream, response.getOutputStream());
                response.getOutputStream().flush();
            } else {
                XWPFDocument doc = new XWPFDocument(inputStream);
                PdfOptions options = PdfOptions.create();
                options.fontEncoding("UTF-8");
                options.fontProvider((familyName, encoding, size, style, color) -> {
                    try {
                        BaseFont baseFont = BaseFont.createFont(
                                "classpath:fonts/Times_New_Roman.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED
                        );
                        Font font = new Font(baseFont, size, style, color);
                        if (familyName != null)
                            font.setFamily( familyName );
                        return font;
                    }
                    catch (Exception e) {
                        e.printStackTrace();
                        return null;
                    }
                });
                PdfConverter.getInstance().convert(doc, response.getOutputStream(), options);
                response.getOutputStream().flush();
                doc.close();
            }
        }

I am getting OOME:

java.lang.OutOfMemoryError: Java heap space
    at java.base/java.util.Arrays.copyOf(Arrays.java:3537) ~[na:na]
    at java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:100) ~[na:na]
    at java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:130) ~[na:na]
    at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81) ~[na:na]
    at java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:127) ~[na:na]
    at com.lowagie.text.pdf.OutputStreamCounter.write(Unknown Source) ~[itext-2.1.7.jar!/:na]
    at java.base/java.io.ByteArrayOutputStream.writeTo(ByteArrayOutputStream.java:161) ~[na:na]
    at com.lowagie.text.pdf.PdfStream.toPdf(Unknown Source) ~[itext-2.1.7.jar!/:na]
    at com.lowagie.text.pdf.PdfIndirectObject.writeTo(Unknown Source) ~[itext-2.1.7.jar!/:na]
    at com.lowagie.text.pdf.PdfWriter$PdfBody.add(Unknown Source) ~[itext-2.1.7.jar!/:na]
    at com.lowagie.text.pdf.PdfWriter$PdfBody.add(Unknown Source) ~[itext-2.1.7.jar!/:na]
    at com.lowagie.text.pdf.PdfWriter$PdfBody.add(Unknown Source) ~[itext-2.1.7.jar!/:na]
    at com.lowagie.text.pdf.PdfWriter.addToBody(Unknown Source) ~[itext-2.1.7.jar!/:na]
    at com.lowagie.text.pdf.PdfWriter.add(Unknown Source) ~[itext-2.1.7.jar!/:na]
    at com.lowagie.text.pdf.PdfDocument.newPage(Unknown Source) ~[itext-2.1.7.jar!/:na]
    at com.lowagie.text.pdf.PdfDocument.addPTable(Unknown Source) ~[itext-2.1.7.jar!/:na]
    at com.lowagie.text.pdf.PdfDocument.add(Unknown Source) ~[itext-2.1.7.jar!/:na]
    at com.lowagie.text.Document.add(Unknown Source) ~[itext-2.1.7.jar!/:na]
    at fr.opensagres.xdocreport.itext.extension.ExtendedDocument.add(ExtendedDocument.java:114) ~[fr.opensagres.xdocreport.itext.extension-2.1.0.jar!/:2.1.0]
    at fr.opensagres.poi.xwpf.converter.pdf.internal.elements.StylableDocument.flushTable(StylableDocument.java:374) ~[fr.opensagres.poi.xwpf.converter.pdf-2.1.0.jar!/:2.1.0]
    at fr.opensagres.poi.xwpf.converter.pdf.internal.elements.StylableDocument.pageBreak(StylableDocument.java:141) ~[fr.opensagres.poi.xwpf.converter.pdf-2.1.0.jar!/:2.1.0]
    at fr.opensagres.poi.xwpf.converter.pdf.internal.elements.StylableDocument.columnBreak(StylableDocument.java:120) ~[fr.opensagres.poi.xwpf.converter.pdf-2.1.0.jar!/:2.1.0]
    at fr.opensagres.poi.xwpf.converter.pdf.internal.elements.StylableDocument.addElement(StylableDocument.java:101) ~[fr.opensagres.poi.xwpf.converter.pdf-2.1.0.jar!/:2.1.0]
    at fr.opensagres.poi.xwpf.converter.pdf.internal.PdfMapper.endVisitParagraph(PdfMapper.java:458) ~[fr.opensagres.poi.xwpf.converter.pdf-2.1.0.jar!/:2.1.0]
    at fr.opensagres.poi.xwpf.converter.pdf.internal.PdfMapper.endVisitParagraph(PdfMapper.java:122) ~[fr.opensagres.poi.xwpf.converter.pdf-2.1.0.jar!/:2.1.0]
    at fr.opensagres.poi.xwpf.converter.core.XWPFDocumentVisitor.visitParagraph(XWPFDocumentVisitor.java:412) ~[fr.opensagres.poi.xwpf.converter.core-2.1.0.jar!/:2.1.0]
    at fr.opensagres.poi.xwpf.converter.core.XWPFDocumentVisitor.visitBodyElements(XWPFDocumentVisitor.java:264) ~[fr.opensagres.poi.xwpf.converter.core-2.1.0.jar!/:2.1.0]
    at fr.opensagres.poi.xwpf.converter.core.XWPFDocumentVisitor.start(XWPFDocumentVisitor.java:216) ~[fr.opensagres.poi.xwpf.converter.core-2.1.0.jar!/:2.1.0]
    at fr.opensagres.poi.xwpf.converter.pdf.PdfConverter.doConvert(PdfConverter.java:57) ~[fr.opensagres.poi.xwpf.converter.pdf-2.1.0.jar!/:2.1.0]
    at fr.opensagres.poi.xwpf.converter.pdf.PdfConverter.doConvert(PdfConverter.java:39) ~[fr.opensagres.poi.xwpf.converter.pdf-2.1.0.jar!/:2.1.0]
    at fr.opensagres.poi.xwpf.converter.core.AbstractXWPFConverter.convert(AbstractXWPFConverter.java:42) ~[fr.opensagres.poi.xwpf.converter.core-2.1.0.jar!/:2.1.0]

There are some docx files with complex structs and sizes are at most 500KB. How to fix this? OR what libraries can I use? Tried docx4j, but there are dependencies that uses old versions of javax bind and cannot resolve conflicts. Currently XMX set to 1g, but tried with 2g, did not help

Upvotes: 0

Views: 59

Answers (1)

OfficialHk
OfficialHk

Reputation: 1

1.Increase Java Heap Size Further: You've tried 2GB, but complex DOCX files with heavy tables, images, or embedded objects might require more memory. Try setting:

sh -Xms512m -Xmx4g

  1. Stream Instead of Buffering the Entire Document Writing in chunks instead of fully loading the DOCX into memory. Reducing unnecessary object retention don't keep both DOCX and PDF representations in memory simultaneously: code:
try (InputStream inputStream = minioClient.getObject(GetObjectArgs.builder()
        .bucket(bucketName)
        .object(file.getS3Id())
        .build());
     OutputStream outStream = response.getOutputStream()) {

    XWPFDocument doc = new XWPFDocument(inputStream);
    PdfOptions options = PdfOptions.create();

    // Enable streaming-based font provider to reduce memory usage
    options.fontProvider((familyName, encoding, size, style, color) -> {
        try {
            BaseFont baseFont = BaseFont.createFont(
                    "classpath:fonts/Times_New_Roman.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED
            );
            return new Font(baseFont, size, style, color);
        } catch (Exception e) {
            e.printStackTrace();
            return null;
        }
    });

    PdfConverter.getInstance().convert(doc, outStream, options);
    outStream.flush();
    doc.close();
}

Upvotes: 0

Related Questions