Abhinav Jonnada
Abhinav Jonnada

Reputation: 21

Extract Data from PDF and populate in Excel using Python

I need bit of advice ............. I'm working on a program in Python, the program would read data from a PDF and I'm supposed to populate the same information in a excel sheet Right now I'm using PyPDF 2 to extract the data and I would be using Panda to store the data in a data frame and then that data frame would be populated in to excel sheet Is my path of action efficient and if there's a better way or a flaw in my plan please let me know about it.

Upvotes: 0

Views: 20304

Answers (1)

ASH
ASH

Reputation: 20342

I think it should be something like this.

import PyPDF2
import openpyxl

pdfFileObj = open('C:/Users/Excel/Desktop/TABLES.pdf', 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
pdfReader.numPages

pageObj = pdfReader.getPage(0)
mytext = pageObj.extractText()


wb = openpyxl.load_workbook('C:/Users/Excel/Desktop/excel.xlsx')
sheet = wb.active
sheet.title = 'MyPDF'
sheet['A1'] = mytext

wb.save('C:/Users/Excel/Desktop/excel.xlsx')
print('DONE!!')

See the link below for more details.

http://automatetheboringstuff.com/chapter12/

Upvotes: 1

Related Questions