Grom ila
Grom ila

Reputation: 1

Error converting from PDF to DOCX. tabula

When converting from PDF to DOCX, the file is unreadable. I convert via tabula. I also tried via pdf2docx, also unreadable text.

import tabula
from docx import Document

tables = tabula.read_pdf(input_path='2.pdf',
                         pages='all')
doc = Document()
for table in tables:
    doc.add_table(rows=len(table),
                  cols=len(table.columns),
                  style='Table Grid')
    for i, row in enumerate(table.iterrows()):
        for j, value in enumerate(row[1]):
            doc.tables[-1].cell(i, j).text = str(value)

    doc.add_paragraph()

doc.save('123.docx')

Upvotes: 0

Views: 30

Answers (0)

Related Questions