Reputation: 488
I am able to read the text from pdf using code:
import pdfx
pdf = pdfx.PDFx("1951.pdf")
metadata = pdf.get_metadata()
reference_list = pdf.get_references()
reference_dict = pdf.get_references_as_dict()
pdf.download_pdfs("D:/")
pdf.get_text()
But can't convert it into json:
pdfx -d D:/Output/ -j -o output.json pdf
SyntaxError: invalid syntax
Syntax: pdfx [-h] [-d OUTPUT_DIRECTORY] [-c] [-j] [-v] [-t] [-o OUTPUT_FILE] [--version] pdf
Upvotes: 1
Views: 34536
Reputation: 488
I was able to convert to XML using the Pdfminer Python module.
Upvotes: 8