joe wong
joe wong

Reputation: 473

Can I change text in MS Word using python-docx, without losing characteristics?

I now have a English word document in MS Word and I want to change its texts into Chinese using python. I've been using Python 3.4 and installed python-docx. Here's my code:

from docx import Document
document = Document(*some MS Word file*)
# I only change the texts of the first two paragraphs
document.paragraphs[0].text = '带有消毒模式的地板清洁机'
document.paragraphs[1].text = '背景'
document.save(*save_file_path*)

The first two lines did turn into Chinese characters, but characteristics like font and bold are all gone: the original file looks like this

and the new file looks like this

Is there anyway I could alter text without losing the original characteristics?

Upvotes: 1

Views: 2315

Answers (2)

scanny
scanny

Reputation: 29031

It depends on how the characteristics are applied. There is a thing called the style hierarchy, and text characteristics can be applied anywhere from directly to a run of text, a style, or a document default, and levels in-between.

There are two main classes of characteristic: paragraph properties and run properties. Paragraph properties are things like justification, space before and after, etc. Everything having to do with character-level formatting, like size, typeface, color, subscript, italic, bold, etc. is a run property, also loosely known as a font.

So if you want to preserve the font of a run of text, you need to operate at the run level. An operation like this will preserve font formatting:

run.text = "New text"

An operation like this will preserve paragraph formatting, but remove any character level formatting not applied by the paragraph style:

paragraph.text = "New paragraph text"

You'll need to decide for your application whether you modify individual runs (which may be tricky to identify) or whether you work perhaps with distinct paragraphs and apply different styles to each. I recommend the latter. So in your example, "FLOOR CLEANING MACHINE ...", "BACKGROUND", and "[0001]..." would each become distinct paragraphs. In your screenshot they appear as separate runs in a single paragraph, separated by a line break.

Upvotes: 1

Steve Barnes
Steve Barnes

Reputation: 28415

You can get the style of the existing paragraphs and apply it to your new paragraphs - beware that the existing paragraphs might specify a font that does not support Chinese.

Upvotes: 0

Related Questions