How to find the Font Size of every paragraph of PDF file using python code?

Question

Right now i am Working on a project in which i have to find the font size of every paragraph in that PDF file. i have tried various python libraries like fitz, PyPDF2, pdfrw, pdfminer, pdfreader. all the libraries fetch the text data but i don't know how to fetch the font size of the paragraphs. thanks in advance..your help is appreciated.

i have tried this but failed to get font size.

import fitz

filepath = '/home/user/Downloads/abc.pdf'
text = ''
with fitz.open(filepath ) as doc:
    for page in doc:
        text+= page.getText()
print(text)

V J · Accepted Answer

I got the solution from pdfminer. The python code for the same is given below.

from pdfminer.high_level import extract_pages
from pdfminer.layout import LTTextContainer, LTChar,LTLine,LAParams
import os
path=r'/path/to/pdf'

Extract_Data=[]

for page_layout in extract_pages(path):
    for element in page_layout:
        if isinstance(element, LTTextContainer):
            for text_line in element:
                for character in text_line:
                    if isinstance(character, LTChar):
                        Font_size=character.size
            Extract_Data.append([Font_size,(element.get_text())])

How to find the Font Size of every paragraph of PDF file using python code?

Answers (2)

Related Questions