V J
V J

Reputation: 171

is there any way to get title of the docx file direct iterating from docx paragraphs and not from metadata?

Using the code given below, i got the title of the docx file. More precisely title is the text which has the largest font size on the first page. But My problem is when i edited the same docx file first page and make some other text font size larger than the previous text on the forst page now i dont get the output text as i want. it is giving the same old output instead giving the new edited large fontsize text. I am using ubuntu.

import docx
doc = docx.Document('/home/user/Desktop/xyz.docx')

print("The first line of document is:", doc.paragraphs[0].text)

list = []
for p in doc.paragraphs:
    size = p.style.font.size
    if size != None:
        size = p.style.font.size.pt
        list.append(size)
print(list)
print(max(list))
for paragraph in doc.paragraphs:
    size = paragraph.style.font.size
    if size != None:
        if paragraph.style.font.size.pt == max(list):
            print(paragraph.text)

Upvotes: 0

Views: 2851

Answers (1)

CodeMonkey
CodeMonkey

Reputation: 23738

The title should typically be on the first page and may be in the first paragraph or not. You can iterate over the paragraphs and look for style with explicit name "Title" which marks that paragraph as the Title style or if Title style is not explictly defined in the Word document then can mark the paragraph with the largest point size and assume that is the title.

import docx

doc = docx.Document('xyz.docx')

title_size = max_size = 0
max_size_text = title = None
for p in doc.paragraphs:
    style = p.style
    if style is not None:
        if style.name == 'Title':
            title_size = style.font.size.pt
            title = p.text
            break
        size = style.font.size
        if size is not None:
            if size.pt > max_size:
                max_size = size.pt
                max_size_text = p.text

if title is not None:
    print(f"Title: size={title_size} text='{title}'")
else:
    print(f"max size title: size={title_size} text='{max_size_text}'")

If title is None then explicit title was not defined then can use the text with the largest point size instead.

Upvotes: 2

Related Questions