Reputation: 21
I need bit of advice ............. I'm working on a program in Python, the program would read data from a PDF and I'm supposed to populate the same information in a excel sheet Right now I'm using PyPDF 2 to extract the data and I would be using Panda to store the data in a data frame and then that data frame would be populated in to excel sheet Is my path of action efficient and if there's a better way or a flaw in my plan please let me know about it.
Upvotes: 0
Views: 20304
Reputation: 20342
I think it should be something like this.
import PyPDF2
import openpyxl
pdfFileObj = open('C:/Users/Excel/Desktop/TABLES.pdf', 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
pdfReader.numPages
pageObj = pdfReader.getPage(0)
mytext = pageObj.extractText()
wb = openpyxl.load_workbook('C:/Users/Excel/Desktop/excel.xlsx')
sheet = wb.active
sheet.title = 'MyPDF'
sheet['A1'] = mytext
wb.save('C:/Users/Excel/Desktop/excel.xlsx')
print('DONE!!')
See the link below for more details.
http://automatetheboringstuff.com/chapter12/
Upvotes: 1