nilez
nilez

Reputation: 51

How to create preview image from Microsoft document using java

Currently, I am working on Microsoft document : Word (doc, docx), Powerpoint (ppt, pptx), and Excel (xls, xlsx)

I would like to create the a preview image from it's first page.

Only PowerPoint document can be done by Apache-poi library.

But I cannot find the solution for other types.

I have got an idea to convert the document to pdf (1) and the convert to image (2) .

For step 2 (convert pdf to image), there are many free java libraries e.g. PDFBox. It work fine with my dummy pdf file

However, I have a problem in Step 1

In my document, it may contains text with several styles, tables, images, or objects. Sample image from first page of word document:

Sample image from first page of word document

Which open source java library can do this task?

I have tried to implement with following libraries:

JODConverter - The output look fine, but it requires OpenOffice.

docx4j - I'm not sure whether it can work with non ooxml format (doc, xls, ppt) and it really free? Following is example code:

String inputWordPath = "C:\\Users\\test\\Desktop\\TestPDF\\Docx.docx";
String outputPDFPath = "C:\\Users\\test\\Desktop\\TestPDF\\OutDocx4j.pdf";
try {
    InputStream is = new FileInputStream(new File(inputWordPath));
    WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(is);
    Mapper fontMapper = new IdentityPlusMapper();
    wordMLPackage.setFontMapper(fontMapper);
    Docx4J.toPDF(wordMLPackage, new FileOutputStream(new File(outputPDFPath)));
} catch (Exception e) {
    e.printStackTrace();
}

The output look ok but it contains "## Evaluation Use only ##" in generated pdf.

xdocreport - The generated pdf does not contain image.

String inputWordPath = "C:\\Users\\test\\Desktop\\TestPDF\\Docx.docx";
String outputPDFPath = "C:\\Users\\test\\Desktop\\TestPDF\\OutXDOCReport.pdf";
InputStream is = new FileInputStream(new File(inputWordPath));
XWPFDocument document = new XWPFDocument(is);
PdfOptions options = PdfOptions.create();
OutputStream out = new FileOutputStream(new File(outputPDFPath));
PdfConverter.getInstance().convert(document, out, options);

I can not find the suitable library for the task.

Upvotes: 5

Views: 5105

Answers (3)

Osiris Team
Osiris Team

Reputation: 101

Solution by @sbraconnier in newer versions, with direct in-memory handling:

import org.jodconverter.core.document.DefaultDocumentFormatRegistry;
import org.jodconverter.core.office.OfficeException;
import org.jodconverter.local.LocalConverter;
import org.jodconverter.local.office.LocalOfficeManager;
import org.jodconverter.local.filter.PagesSelectorFilter;

import java.io.ByteArrayOutputStream;
import java.io.InputStream;

public class Office {
    // Create an office manager using the default configuration.
    // The default port is 2002. Note that when an office manager
    // is installed, it will be the one used by default when
    // a converter is created.
    final public static LocalOfficeManager officeManager = LocalOfficeManager.install();
    static{
        // Start an office process and connect to the started instance (on port 2002).
        try {
            officeManager.start();
            Runtime.getRuntime().addShutdownHook(new Thread(() -> {
                try {
                    officeManager.stop();
                } catch (OfficeException e) {
                    //AL.warn(e);
                }
            }));
        } catch (OfficeException e) {
            //AL.warn(e);
        }
    }

    /**
     * @param inputFile document.docx
     * @return document.png preview image bytes.
     */
    public static byte[] createPreview(InputStream inputFile) throws OfficeException {
        final ByteArrayOutputStream outputFile = new ByteArrayOutputStream();

        // Create a page selector filter in order to
        // convert only the first page.
        final PagesSelectorFilter selectorFilter = new PagesSelectorFilter(1);

        LocalConverter
                .builder()
                .filterChain(selectorFilter)
                .build()
                .convert(inputFile)
                .to(outputFile)
                .as(DefaultDocumentFormatRegistry.PNG)
                .execute();
        return outputFile.toByteArray();
    }
}

Upvotes: 0

Tilal Ahmad
Tilal Ahmad

Reputation: 939

You can try GroupDocs.Conversion Cloud SDK for Java, its free package plan provides 50 free credits per month. It supports conversion of all common file formats.

Sample DOCX to Image stream conversion code:

// Get App Key and App SID from https://dashboard.groupdocs.cloud/
ConvertApi apiInstance = new ConvertApi(AppSID,AppKey);
try {

    ConvertSettings settings = new ConvertSettings();

    settings.setStorageName(Utils.MYStorage);
    settings.setFilePath("conversions\\password-protected.docx");
    settings.setFormat("jpeg");

    DocxLoadOptions loadOptions = new DocxLoadOptions();
    loadOptions.setPassword("password");
    loadOptions.setHideWordTrackedChanges(true);
    loadOptions.setDefaultFont("Arial");

    settings.setLoadOptions(loadOptions);

    JpegConvertOptions convertOptions = new JpegConvertOptions();
    convertOptions.setFromPage(1);
    convertOptions.setPagesCount(1);
    convertOptions.setGrayscale(false);
    convertOptions.setHeight(1024);
    convertOptions.setQuality(100);
    convertOptions.setRotateAngle(90);
    convertOptions.setUsePdf(false);
    settings.setConvertOptions(convertOptions);

    // set OutputPath as empty will result the output as document IOStream
    settings.setOutputPath("");

    // convert to specified format
    File response = apiInstance.convertDocumentDownload(new ConvertDocumentRequest(settings));
    System.out.println("Document converted successfully: " + response.length());
} catch (ApiException e) {
    System.err.println("Exception while calling ConvertApi:");
    e.printStackTrace();
}

I am developer evangelist at Aspose.

Upvotes: 3

sbraconnier
sbraconnier

Reputation: 465

If you can afford having a LibreOffice (or Apache OpenOffice) installation, JODConverter should do the trick just fine (and for free).

Note that the latest version of JODConverter available in the Maven Central Repository offers a feature, called Filters that would allow you to convert only the first page easily, and it supports conversion to PNG out of the box. Here's a quick example on how to do so:

// Create an office manager using the default configuration.
// The default port is 2002. Note that when an office manager
// is installed, it will be the one used by default when
// a converter is created.
final LocalOfficeManager officeManager = LocalOfficeManager.install(); 
try {

    // Start an office process and connect to the started instance (on port 2002).
    officeManager.start();

    final File inputFile = new File("document.docx");
    final File outputFile = new File("document.png");

    // Create a page selector filter in order to
    // convert only the first page.
    final PageSelectorFilter selectorFilter = new PageSelectorFilter(1);

    LocalConverter
      .builder()
      .filterChain(selectorFilter)
      .build()
      .convert(inputFile)
      .to(outputFile)
      .execute();
} finally {
    // Stop the office process
    LocalOfficeUtils.stopQuietly(officeManager);
}

As for your question

Can I set size of pdf to fit with converted document content

If you can do it using LibreOffice or Apache OpenOffice without JODConverter, then you can do it with JODConverter. You just have to find out how it can be done programmatically, and then create a filter to use with JODConverter.

I won't go in details here since you may choose another way but if you need further assistance, just ask on the Gitter Community of the project.

Upvotes: 4

Related Questions