Reputation: 1
When converting from PDF to DOCX, the file is unreadable. I convert via tabula. I also tried via pdf2docx, also unreadable text.
import tabula
from docx import Document
tables = tabula.read_pdf(input_path='2.pdf',
pages='all')
doc = Document()
for table in tables:
doc.add_table(rows=len(table),
cols=len(table.columns),
style='Table Grid')
for i, row in enumerate(table.iterrows()):
for j, value in enumerate(row[1]):
doc.tables[-1].cell(i, j).text = str(value)
doc.add_paragraph()
doc.save('123.docx')
Upvotes: 0
Views: 30