Reputation: 27
I am trying extract some information from first few pages of the pdf documents using slate3k as follows:
for i in range(table.shape[0]):
print(i)
download_path = pdf_dir + '/'+ table.iloc[i,6]
if(path.exists(download_path)):
if download_path.endswith('.pdf'):
file = open(download_path,'rb')
doc = slate3k.PDF(file)
doc = ' '.join(doc[:2])
doc = re.sub("\n","",doc)
And I am getting the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-10-474429c993a7> in <module>
12 if download_path.endswith('.pdf'):
13 file = open(download_path,'rb')
---> 14 doc = slate3k.PDF(file)
15 doc = ' '.join(doc[:2])
16 doc = re.sub("\n","",doc)
~\Anaconda3\lib\site-packages\slate3k\classes.py in __init__(self, file, password, just_text, check_extractable, char_margin, line_margin, word_margin)
57
58 if PYTHON_3:
---> 59 self.doc = PDFDocument()
60 self.parser.set_document(self.doc)
61 self.doc.set_parser(self.parser)
TypeError: __init__() missing 1 required positional argument: 'parser'
Can anyone please help me understand what the error is? And how can I resolve this issue?
Upvotes: 0
Views: 2511
Reputation: 11
I encountered the same problem, and the reason why this error appears is the pyinstaller. It seems to interfere with the API of slate3k.
What I have done to solve this problem
(Optional - In my case I also had to uninstall pdfminer3, so:) 3. pip uninstall pdfminer3
Now it should work.
EDIT: it seems like pyinstaller can cause the same error therefore you could try the same thing as above with pyinstaller = pdfminer.
Upvotes: 1