Reputation: 6040
I am currently dealing with a large .docx file (roughly 400 pages). It is divided up into sections that are very easily digestable by humans and look like this :
Bold text
Written paragraph
This is perfectly humanly readable and great. Unfortunately we have an in-house program in our University that uses the mark-up of .docx files to sort them out/do some processing on them. By this I mean that sectioning a .doc/.docx using only bold markup is not enough, you must use the in-built tools within MS Office to do this (as below) :
So what I need to write is a simple script that will find the text that is bold within a .docx document and convert this text to properly marked up "Heading 1"s, or similar. It doesn't concern me whether or not the .docx file format is maintained or anything like this.
is it possible to do this? What APIs/languages/tools should I start looking into to accomplish this relatively simple task?
Upvotes: 1
Views: 199
Reputation: 176159
Using a short VBA macro you can iterate over all paragraphs and change the style for all paragraphs containing only bold text into a heading style:
Sub FormatBoldAsHeading()
Dim p As Paragraph
For Each p In ActiveDocument.Paragraphs
If p.Range.Font.Bold <> wdUndefined And p.Range.Font.Bold Then
p.Style = WdBuiltinStyle.wdStyleHeading1
End If
Next
End Sub
Upvotes: 1