Reputation: 11
This is the code:
import os
from openpyxl import Workbook
from PyPDF2 import PdfReader
input_folder = r"C:\Users\91620\OneDrive\Desktop\Final Year Project\case laws (2)\New folder (2)"
output_file = r"C:\Users\91620\OneDrive\Desktop\Final Year Project\sentemental.xlsx"
wb = Workbook()
ws = wb.active
for i, filename in enumerate(os.listdir(input_folder)):
if filename.endswith(".pdf"):
filepath = os.path.join(input_folder, filename)
with open(filepath, "rb") as f:
pdf = PdfReader(f)
readerpdf=len(pdf.pages())
for page in range(readerpdf):
text = pdf.getPage(page).extractText()
ws.cell(row=page+1, column=i+1).value = text
wb.save(output_file)
This is the error I am receiving:
TypeError Traceback (most recent call last)
Cell In [9], line 16
14 with open(filepath, "rb") as f:
15 pdf = PdfReader(f)
---> 16 readerpdf=len(pdf.pages())
17 for page in range(readerpdf):
18 text = pdf.getPage(page).extractText()
TypeError: '_VirtualList' object is not callable
I want the pdf files text in the column of a excel sheet
Upvotes: 0
Views: 392
Reputation: 330
The reason you received an error is actually explained by the Traceback. On line 16, there is a non-callable object that is being called. There are 2 objects that are being called on line 16: len()
and pages()
. You know that len
is callable (as it is a built-in function), so the non-callable object must be pages
.
Simply remove the parentheses, so that you have pdf.pages
.
Upvotes: 1