Kitchen
Kitchen

Reputation: 83

Docx to pdf using pandoc in python

So I a quite new to Python so it may be a silly question but i can't seem to find the solution anywhere.

I have a django site I am running it locally on my machine just for development. on the site I want to convert a docx file to pdf. I want to use pandoc to do this. I know there are other methods such as online apis or the python modules such as "docx2pdf". However i want to use pandoc for deployment reasons.

I have installed pandoc on my terminal using brew install pandoc. so it should b installed correctly.

In my django project i am doing:

import pypandoc
import docx

def making_a_doc_function(request):
    doc = docx.Document()
    doc.add_heading("MY DOCUMENT")
    doc.save('thisisdoc.docx')
    pypandoc.convert_file('thisisdoc.docx', 'docx', outputfile="thisisdoc.pdf")     
    pdf = open('thisisdoc.pdf', 'rb')
    response = FileResponse(pdf) 
return response

The docx file get created no problem but it not pdf has been created. I am getting an error that says:

Pandoc died with exitcode "4" during conversion: b'cannot produce pdf output from docx\n'

Does anyone have any ideas?

Upvotes: 3

Views: 13645

Answers (2)

D4ario0
D4ario0

Reputation: 1

Pypandoc does not support direct conversion form word to pdf, it loses a lot of information and formatting, it cannot be done using only pypandoc.

You can use rocketpdf for basic pdf manipulation on a backend.

pip install rocketpdf

Using the CLI via subprocess module:

import subprocess

command = ["rocketpdf", "parsedoc", "report.docx", "--output", "report.pdf"]
result = subprocess.run(command, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

Upvotes: 0

tarleb
tarleb

Reputation: 22659

The second argument to convert_file is output format, or, in this case, the format through which pandoc generates the pdf. Pandoc doesn't know how to produce a PDF through docx, hence the error.

Use pypandoc.convert_file('thisisdoc.docx', 'latex', outputfile="thisisdoc.pdf") or pypandoc.convert_file('thisisdoc.docx', 'pdf', outputfile="thisisdoc.pdf") instead.

Upvotes: 4

Related Questions