Jay chuks
Jay chuks

Reputation: 389

Convert pdf to docx format in python

Please how do I convert pdf to docx. I tried converting to html using pdfminer to extract the text but still doesn't look good enough.

Upvotes: 2

Views: 1504

Answers (1)

thrinadhn
thrinadhn

Reputation: 2523

pdf2docx

  1. Install the pdf2docx package Click here

Installation

  • Clone or download pdf2docx

     pip install pdf2docx
         or
     # download the package and install your environment
     python setup.py install 
    
  • Option 1

    from pdf2docx import Converter
    
    pdf_file  = r'C:\Users\ABCD\Desktop\XYZ/Document1.pdf'# source file 
    docx_file = r'C:\Users\ABCD\Desktop\XYZ/sample.docx'  # destination file
    
    # convert pdf to docx
    cv = Converter(pdf_file)
    cv.convert(docx_file, start=0, end=None)
    cv.close()
    
    #Output
    
    Parsing Page 53: 53/53...
    Creating Page 53: 53/53...
    --------------------------------------------------
    Terminated in 6.258919400000195s.
    
  • Option 2

    from pdf2docx import parse
    
    pdf_file  = r'C:\Users\ABCD\Desktop\XYZ/Document2.pdf' # source file
    docx_file = r'C:\Users\ABCD\Desktop\XYZ/sample_2.docx' # destination file
    
    # convert pdf to docx
    parse(pdf_file, docx_file, start=0, end=None)
    
    # output
    Parsing Page 53: 53/53...
    Creating Page 53: 53/53...
    --------------------------------------------------
    Terminated in 5.883666100000482s.
    

Upvotes: 5

Related Questions