user2211381
user2211381

Reputation: 31

how to convert doc,docx files to pdf in java programatically

I am able to generate pdf from docx file using docx4j.But i need to convert doc file to pdf including images and tables. Is there any way to convert doc to docx in java. or (doc to pdf)?

Upvotes: 3

Views: 14213

Answers (4)

hd1
hd1

Reputation: 34657

Cribbing off the POI unit tests, I came up with this to extract the text from a word document:

public String getText(String document) {
    try {
        ZipInputStream is = new ZipInputStream(new FileInputStream(document));
        try {
            is.getNextEntry();
            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            try {
                IOUtils.copy(is, baos);
            } finally {
                baos.close();
            }

            byte[] byteArray = baos.toByteArray();
            ByteArrayInputStream bais = new ByteArrayInputStream(byteArray);
            HWPFDocument doc = new HWPFDocument(bais);
            extractor = new WordExtractor(doc);
            extractor.getText();
        } finally {
            is.close();
        }
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}

I do hope that points you in the right direction, if not sorts you entirely.

Upvotes: 2

Chunky Gupta
Chunky Gupta

Reputation: 1

https://github.com/guptachunky/Conversion-Work This Github Link might be helpful for that.

https://github.com/guptachunky/Conversion-Work/blob/main/src/main/java/com/conversion/Conversion/Service/ConversionService.java

public void docToPdf(FileDetail fileDetail, HttpServletResponse response) {
    InputStream doc;
    try {
        File docFile = converterToFile(fileDetail.getFile());
        doc = new FileInputStream(docFile);
        XWPFDocument document = new XWPFDocument(doc);
        PdfOptions options = PdfOptions.create();
        File file = File.createTempFile("output", ".pdf");
        OutputStream out = new FileOutputStream(file);
        PdfConverter.getInstance().convert(document, out, options);
        getClaimFiles(file, response);
    } catch (IOException e) {
        response.setStatus(AppConstant.SOMETHING_WENT_WRONG);
    }
}

public void getClaimFiles(File file, HttpServletResponse response) {
    try {
        response.setContentType("application/pdf");
        response.setHeader("Content-Disposition",
                "attachment; filename=dummy.pdf");
        response.getOutputStream().write(Files.readAllBytes(file.toPath()));
    } catch (Exception e) {
        response.setStatus(AppConstant.SOMETHING_WENT_WRONG);
    }
}

Upvotes: 0

JasonPlutext
JasonPlutext

Reputation: 15863

docx4j contains org.docx4j.convert.in.Doc, which uses POI to read the .doc, but it is a proof of concept, not production ready code. Last I checked, there were limits to POI's HWPF parsing of a binary .doc.

Further to mqchen's comment, you can use LibreOffice or OpenOffice to convert doc to docx. But if you are going to use LibreOffice or OpenOffice, you may as well use it to convert both .doc and .docx directly to PDF. Google 'jodconverter'.

Upvotes: 3

Jabir
Jabir

Reputation: 2866

You can use jWordConvert for this.

jWordConvert is a Java library that can read and render Word documents natively to convert to PDF, to convert to images, or to print the documents automatically.

Details can be found at following link http://www.qoppa.com/wordconvert/

Upvotes: 0

Related Questions