Simon Kiely
Simon Kiely

Reputation: 6040

Converting bold text within a .doc to marked-up text programmatically

I am currently dealing with a large .docx file (roughly 400 pages). It is divided up into sections that are very easily digestable by humans and look like this :

Bold text

Written paragraph

This is perfectly humanly readable and great. Unfortunately we have an in-house program in our University that uses the mark-up of .docx files to sort them out/do some processing on them. By this I mean that sectioning a .doc/.docx using only bold markup is not enough, you must use the in-built tools within MS Office to do this (as below) :

Image showing the menu page of MS Office where you can highlight a piece of text and set it to a Header 1/header2 etc etc.

So what I need to write is a simple script that will find the text that is bold within a .docx document and convert this text to properly marked up "Heading 1"s, or similar. It doesn't concern me whether or not the .docx file format is maintained or anything like this.

is it possible to do this? What APIs/languages/tools should I start looking into to accomplish this relatively simple task?

Upvotes: 1

Views: 199

Answers (1)

Dirk Vollmar
Dirk Vollmar

Reputation: 176159

Using a short VBA macro you can iterate over all paragraphs and change the style for all paragraphs containing only bold text into a heading style:

Sub FormatBoldAsHeading()

    Dim p As Paragraph

    For Each p In ActiveDocument.Paragraphs
        If p.Range.Font.Bold <> wdUndefined And p.Range.Font.Bold Then
            p.Style = WdBuiltinStyle.wdStyleHeading1
        End If
    Next

End Sub

Upvotes: 1

Related Questions