Reputation: 3148
Having bunch of PDF files with text in one directory. My idea to be able to read them all at once and save in a dictionary. Now I'm able to do it only one by one by using textract
library like this:
import textract
text = textract.process('/Users/user/Documents/Data/CLAR.pdf',
method='tesseract',
language='eng')
How is it possible to read them at once? Do I need to use for
loops for searching in directory or smth other way?
Upvotes: 2
Views: 1174
Reputation: 1311
One solution might be using os library
with for loop
import os
import textract
files_path = [os.path.abspath(x) for x in os.listdir()]
# Excluding not .pdf files
files_path = [pdf for pdf in files_path if '.pdf' in pdf]
pdfs = []
for file in files_path:
text = textract.process(file,
method='tesseract',
language='eng')
pdfs += [text]
.pdf
filesUpvotes: 3