Reputation: 1
I have a PDF file generated from the AutoCAD software. Now, I need to extract text information from the PDF for which I am using pdfminer.six. But I am unable to extract the text that is part of table (I guess this is a table/title block..). Could someone advise how we can extract such texts. I have attached the screenshot of the PDF portion that needs to be extracted. screenshot of the PDF
I tried using pdfplumber too, but no luck.
Upvotes: -2
Views: 47
Reputation: 1
If it is searchable pdf file I would suggest using Camelot. If it is not searchable or a scan, I would use Tesseract OCR to extract text from the table. Let me know if it helps or if you need more info regarding either Tesseract or Camelot.
Upvotes: -1