Reputation: 1295
I am trying to batch manipulate many microsoft word documents in .docx format within python.
The following code accomplishes what I need, except it loses special characters I would like to preserve, like the right arrow symbol and bullets.
import docx
def getText(filename):
doc = docx.Document(filename)
fullText = []
for para in doc.paragraphs:
fullText.append(para.text)
return fullText
getText('example.docx')
Upvotes: 1
Views: 2704
Reputation: 28903
The Paragraph.text
property in python-pptx
returns the plain text in the paragraph as a string. This is a very common requirement.
Bullets, or numbered lists in general (of which bullets are a type) are not reflected in the text of a paragraph, even though it may appear that way on-screen. This sort of thing will be an additional property of the paragraph.
One way bullets can be applied is using the 'List Bullet' style. The paragraph style is available on Paragraph.style
.
The documentation here is your friend for this and other details, in particular the 11 topics in the User Guide section:
http://python-docx.readthedocs.io/en/latest/
Upvotes: 2