extract borderless table with pdfplumber

Question

I am trying to extract the borderless tables from the PDF document, I have tried few combination with PDF table_settings parameter, however pdfplumber cannot recognize the borderless tables correctly

pdf file can be downloaded from the link

Here is my code

import pdfplumber
pdf_file="pdffile"
with pdfplumber.open(pdf_file) as pdf:
    for i in range(0,len(pdf.pages)):
        try:
           if i==7:
               bold_title_text=pdf.pages[i]
               ff=bold_title_text.extract_table(table_settings=
                                                    {"vertical_strategy": "text", 
                                                     "horizontal_strategy": "lines",
                                                     "keep_blank_chars": "True",                                                                                                                          
                                                     "snap_tolerance": 4,
                                                   })
            display(ff[1])
       except IndexError:
           print("")
           break

output ['Element','nt Attribute Size Input Type Requirement']

Expected Output ['Element', 'Attribute', 'Size', 'Input Type', 'Requirement']

extract borderless table with pdfplumber

Answers (1)

Related Questions