Chirsty CR_007
Chirsty CR_007

Reputation: 69

PDF-Plumber Extracting title if metadata is not present

I have used pdf plumber to extract the text out of pdf files as per the GitHub page (https://github.com/jsvine/pdfplumber) I went through all properties, I need to extract the title of the pdf if the metadata is not present.

or any other way we can achieve this using python

import pdfplumber
pdf = pdfplumber.open(r'1.pdf')
page = pdf.pages[0]
text = page.extract_text()
print(page.chars[0])

Upvotes: 3

Views: 2177

Answers (1)

Shuail_CR007
Shuail_CR007

Reputation: 312

I have found the below approach

import pdfplumber
pdf = pdfplumber.open(r'1.pdf')
page = pdf.pages[0]

filtered = page.filter(lambda x: x.get("size", 0) > 20)
filtered.extract_text()

Upvotes: 3

Related Questions