Reputation: 171
Using the code given below, i got the title of the docx file. More precisely title is the text which has the largest font size on the first page. But My problem is when i edited the same docx file first page and make some other text font size larger than the previous text on the forst page now i dont get the output text as i want. it is giving the same old output instead giving the new edited large fontsize text. I am using ubuntu.
import docx
doc = docx.Document('/home/user/Desktop/xyz.docx')
print("The first line of document is:", doc.paragraphs[0].text)
list = []
for p in doc.paragraphs:
size = p.style.font.size
if size != None:
size = p.style.font.size.pt
list.append(size)
print(list)
print(max(list))
for paragraph in doc.paragraphs:
size = paragraph.style.font.size
if size != None:
if paragraph.style.font.size.pt == max(list):
print(paragraph.text)
Upvotes: 0
Views: 2851
Reputation: 23738
The title should typically be on the first page and may be in the first paragraph or not. You can iterate over the paragraphs and look for style with explicit name "Title" which marks that paragraph as the Title style or if Title style is not explictly defined in the Word document then can mark the paragraph with the largest point size and assume that is the title.
import docx
doc = docx.Document('xyz.docx')
title_size = max_size = 0
max_size_text = title = None
for p in doc.paragraphs:
style = p.style
if style is not None:
if style.name == 'Title':
title_size = style.font.size.pt
title = p.text
break
size = style.font.size
if size is not None:
if size.pt > max_size:
max_size = size.pt
max_size_text = p.text
if title is not None:
print(f"Title: size={title_size} text='{title}'")
else:
print(f"max size title: size={title_size} text='{max_size_text}'")
If title is None then explicit title was not defined then can use the text with the largest point size instead.
Upvotes: 2