We are Borg
We are Borg

Reputation: 5311

Java : Convert from doc to pdf and ppt to pdf failing

For our Java project I am looking into converting office files into PDF, and subsequently to images. Currently, I have success with pptx, docx, xls, xlsx, pdf to Image. If anyone requires working code for the above mentioned, lemme know.

Unfortunately, doc to PDF and ppt to PDF is not working. I have tried multiple solutions, but none of them seem to work. The latest I have tried is JODConvertor, but that also failed. JodConvertor library was unable to connect to libreoffice, which I am running at given port.

Can anyone give me some reliable way to convert DOC && PPT to PDF and which are free of cost?

Code :

  private String createDocToPDfAndThenToImage(String path) {

        try {
            File inputFile = new File(path);
            File outputFile = File.createTempFile("/home/akshay/jodtest", ".pdf");

            OpenOfficeConnection connection = new SocketOpenOfficeConnection("127.0.0.1", 8100);
            connection.connect();

            DocumentConverter converter = new OpenOfficeDocumentConverter(connection);
            converter.convert(inputFile, outputFile);
            connection.disconnect();
        } catch (Exception e) {
            e.printStackTrace();
        }
        return "";
    }

Error log :

java.net.ConnectException: connection failed: socket,host=127.0.0.1,port=8100,tcpNoDelay=1: java.net.ConnectException: Connection refused
    at com.artofsolving.jodconverter.openoffice.connection.AbstractOpenOfficeConnection.connect(AbstractOpenOfficeConnection.java:79)
    at com.journaldev.spring.service.GroupAttachmentsServiceImpl.createDocToPDfAndThenToImage(GroupAttachmentsServiceImpl.java:406)
    at com.journaldev.spring.service.GroupAttachmentsServiceImpl.addAttachment(GroupAttachmentsServiceImpl.java:338)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)

A headless instance was already started with this command :

 /usr/bin/libreoffice --headless --accept=socket,host=localhost,port=8100;

If there is no way to fix this, any other conversion mechanism would work. Kindly let me know. Thank you.

Upvotes: 1

Views: 2425

Answers (2)

Kirill Induchnyj
Kirill Induchnyj

Reputation: 1

Use OpenOffice. Download OpenOffice.and use this code for converting from doc/docx to PDF

dependencies

compile "org.openoffice:bootstrap-connector:0.1.1"

compile "org.openoffice:unoil:4.1.2"

compile "org.openoffice:ridl:4.1.2"

compile "org.openoffice:jurt:4.1.2"

compile "org.openoffice:juh:4.1.2"

package com.galantis.ecm.converter

import com.galantis.ecm.api.object.model.BaseContent
import org.apache.commons.io.FileUtils

import com.sun.star.beans.PropertyValue;
import com.sun.star.frame.XDesktop;
import com.sun.star.frame.XStorable;
import com.sun.star.lang.XComponent;
import com.sun.star.lang.XMultiComponentFactory;
import com.sun.star.uno.UnoRuntime;
import com.sun.star.uno.XComponentContext;
import com.sun.star.frame.XComponentLoader
import ooo.connector.BootstrapSocketConnector;

class Docx2PdfConverter extends Converter {
  InputStream convert(BaseContent content) {
    try {
      byte[] bytes = content.inputStream.bytes
      def file = new File(FileUtils.getTempDirectory(), "doc.docx")
      FileUtils.writeByteArrayToFile(file, bytes)
      String oooExeFolder = "C:/Program Files (x86)/OpenOffice 4/program";
      XComponentContext xContext = BootstrapSocketConnector.bootstrap(oooExeFolder);
      XMultiComponentFactory xMCF = xContext.getServiceManager();
      Object oDesktop = xMCF.createInstanceWithContext("com.sun.star.frame.Desktop", xContext);
      XDesktop xDesktop = (XDesktop) UnoRuntime.queryInterface(
        XDesktop.class, oDesktop);

      XComponentLoader xCompLoader = (XComponentLoader) UnoRuntime.queryInterface(XComponentLoader.class, xDesktop);
      String sUrl = "file:///" +  file.getAbsolutePath()
      PropertyValue[] propertyValues = new PropertyValue[0];
      propertyValues = new PropertyValue[1];
      propertyValues[0] = new PropertyValue();
      propertyValues[0].Name = "Hidden";
      propertyValues[0].Value = new Boolean(true);

      XComponent xComp = xCompLoader.loadComponentFromURL(sUrl, "_blank", 0, propertyValues);

      XStorable xStorable = (XStorable) UnoRuntime.queryInterface(XStorable.class, xComp);

      propertyValues = new PropertyValue[2];
      propertyValues[0] = new PropertyValue();
      propertyValues[0].Name = "Overwrite";
      propertyValues[0].Value = new Boolean(true);
      propertyValues[1] = new PropertyValue();
      propertyValues[1].Name = "FilterName";
      propertyValues[1].Value = "writer_pdf_Export";

// Appending the favoured extension to the origin document name
      def outPutPdf = new File(FileUtils.getTempDirectory(), "pdf.pdf")
      String myResult = "D:/4.pdf"
      xStorable.storeToURL("file:///" + myResult, propertyValues);

      def result =  new ByteArrayInputStream(FileUtils.readFileToByteArray(new File(myResult)));
      xDesktop.terminate();
      result
    } catch (Exception e) {
      throw e
    }

  }
}

Upvotes: 0

fqye
fqye

Reputation: 41

I am working on something similar. I use unoconv, which requires libreoffice. It works very reliably on PPT/PPTX. It also works on DOC/DOCX, but sometimes it stuck and needed intervention.

Here is the script that loops through a folder to process new PPT/PPTX files.It is used in production. You can add DOC/DOCX as well. I have a java application listening to new file creation event on the folder so it can handle the converted pdf file after conversion is done.

All these status flag files are for human intervention and status check by other scripts. You also need to add some code for old file clean up.

By the way, I am very interested in your way of converting the files to PDF then to images. Maybe it can give me hints to find better solution for my project. Thanks.

    #!/bin/bash

    #endless conversion

    echo "Convert files to pdf..."

    while [ 1 ]
    do
        for file in `ls -tr | grep '.*\.\(ppt\|pptx\)$'`
        do
        if [ -e $file.failed ] || [ -e $file.succeeded ] # already converted
        then
            :
        else
            echo $file > $file.started 
            echo $file > convert.busy # Create busy converting flag file
            output=`./unoconv -f pdf $file`
            result=$?
            echo $result
            if [ $result -ne 0 ]
            then
            echo "conversion to pdf failed"
            echo $file > $file.failed
            else
            echo "conversion to pdf succeeded"
            echo $file > $file.succeeded
            fi
            rm convert.busy # remove busy converting flag file
        fi
        done
        sleep 0.2
    done

Upvotes: 1

Related Questions