Daniel Mason
Daniel Mason

Reputation: 3

Splitting out a Word Document using VBA

Pretty new to the world of VBA, but have been asked if I can make some QoL improvements for routine tasks at work. One task I am struggling to get VBA working for is separating out a word document based on headings? For example we would receive a document along the following lines:

Info Line 1 Info Line 2

Data_Start

-multiple lines of data-

Data_Start

-multiple lines of data-

And so on. What I would like to do is run a Macro that would separate out each "Data_Start" section into a new document - ideally with the same Document Title suffixed with A, B, C etc to denote each new section (or numerically if alphabetically is an issue). Another wishlist item is to have those documents save into the same folder as the original.

I found some short vba code that separates out based on section breaks, with the aim of using it as a jumping off point to build on. However my limited knowledge of VBA is leading to roadblocks in getting things working correctly and I am running into issues with documents saving correctly (place and/or filename).

I understand this a big request but any help would be greatly appreciated!

Upvotes: 0

Views: 856

Answers (3)

Timothy Rylatt
Timothy Rylatt

Reputation: 7860

To split a document at headings you first need to find the headings. As Word's Find function can do this we don't need to resort to crawling through the document paragraph by paragraph.

When coding in Word it is desirable to avoid using the Selection object. This is because using Selection will cause the screen to be redrawn each time it changes, slowing down the code. Instead we work with Range objects.

When Find returns a match .Execute will return a value of True and the Range is redefined to the range of the match.

To get the full text below the heading we can use one of the predefined bookmarks in Word, \HeadingLevel.

To avoid using the clipboard, which slows things down when used within a loop, we make use of the FormattedText property of the range.

Sub SplitDocumentAtHeadings()
        
    Dim findRange As Range: Set findRange = ActiveDocument.Content
    Dim newDoc As Document
    Dim index As Long
    Dim saveRange As Range
    Dim savePath As String: savePath = ActiveDocument.Path & "\"
    Dim saveName As String
    
    With findRange
        With .Find
            .ClearFormatting
            .Style = ActiveDocument.Styles(wdStyleHeading1)
            .Forward = True
            .Format = True
            .Wrap = wdFindStop
        End With
        
        Do While .Find.Execute
            index = index + 1
            'use the heading text as the save name
            saveName = .Text & index & ".docx"
            'get the heading and data that follows it
            Set saveRange = .Duplicate
            'saveRange.Move wdParagraph
            Set saveRange = saveRange.GoTo(What:=wdGoToBookmark, Name:="\HeadingLevel")
            Set newDoc = Documents.Add
            'use formatted text to avoid using the clipboard
            newDoc.Content.FormattedText = saveRange.FormattedText
            newDoc.SaveAs2 FileName:=savePath & saveName, Fileformat:=WdSaveFormat.wdFormatXMLDocument
            .Collapse wdCollapseEnd
        Loop
    End With

End Sub

EDIT:

In response to OP's answer. Please read comments in code to understand where mistakes were made.

Sub SplitDocOnHeading1ToDocxWithHeadingInOutput()

Application.ScreenUpdating = False
Dim Rng As Range
Dim DocSrc As Document
Dim DocTgt As Document
'incorrectly spelt variable name
'Dim sectionCoun As Integer
Dim sectionCount As Integer
Dim i As Long, StrTxt As String: Const StrNoChr As String = """*/\:?|"
Set DocSrc = ActiveDocument
sectionCount = 0

With DocSrc.Range
    With .Find
        .ClearFormatting
        .Replacement.ClearFormatting
        .Format = True
        .Forward = True
        .Text = "Data_Start"
        .Style = wdStyleHeading1
        .Replacement.Text = ""
        .Wrap = wdFindStop
        '.Execute
    End With
    'Find.Found is unreliable. Use Execute instead
    'Do While .Find.Found
    Do While .Find.Execute
        'added incrementing of the section count
        sectionCount = sectionCount + 1
        Set Rng = .Paragraphs(1).Range
        Set Rng = Rng.GoTo(What:=wdGoToBookmark, Name:="\HeadingLevel")
        Set DocTgt = Documents.Add(DocSrc.AttachedTemplate.FullName)
        With DocTgt
            'Not necessary as already set at the beginning
            'Application.ScreenUpdating = False
            .Range.FormattedText = Rng.FormattedText
            StrTxt = Split(.Paragraphs.First.Range.Text, vbCr)(0)
            ' Strip out illegal characters
            For i = 1 To Len(StrNoChr)
                StrTxt = Replace(StrTxt, Mid(StrNoChr, i, 1), "_")
            Next
            '.Paragraphs.First.Range.Delete 'un comment this line if you don't want to retain headings in the output file
            
            'variable sectionCount was declared as sectionCoun and is not incremented in the code
            .SaveAs2 FileName:=DocSrc.Path & "\" & StrTxt & sectionCount & ".docx", Fileformat:=wdFormatXMLDocument, AddToRecentFiles:=False
            .Close False
        End With
    
        .Start = Rng.End
        'Not needed
        '.Find.Execute
    Loop
End With

Set Rng = Nothing: Set DocSrc = Nothing: Set DocTgt = Nothing
Application.ScreenUpdating = True

End Sub

Upvotes: 1

Daniel Mason
Daniel Mason

Reputation: 3

@timothyrylatt I have gotten pretty close with this, which is in the same vein and does borrow some elements:

Sub SplitDocOnHeading1ToDocxWithHeadingInOutput()

    Application.ScreenUpdating = False
    Dim Rng As Range
    Dim DocSrc As Document
    Dim DocTgt As Document
    Dim sectionCoun As Integer
    Dim i As Long, StrTxt As String: Const StrNoChr As String = """*/\:?|"
    Set DocSrc = ActiveDocument
    sectionCount = 0
    
    With DocSrc.Range
      With .Find
        .ClearFormatting
        .Replacement.ClearFormatting
        .Format = True
        .Forward = True
        .Text = "Data_Start"
        .Style = wdStyleHeading1
        .Replacement.Text = ""
        .Wrap = wdFindStop
        .Execute
      End With
      Do While .Find.Found
        Set Rng = .Paragraphs(1).Range
        Set Rng = Rng.GoTo(What:=wdGoToBookmark, Name:="\HeadingLevel")
        Set DocTgt = Documents.Add(DocSrc.AttachedTemplate.FullName)
        With DocTgt
        Application.ScreenUpdating = False
          .Range.FormattedText = Rng.FormattedText
          StrTxt = Split(.Paragraphs.First.Range.Text, vbCr)(0)
          ' Strip out illegal characters
          For i = 1 To Len(StrNoChr)
            StrTxt = Replace(StrTxt, Mid(StrNoChr, i, 1), "_")
          Next
          '.Paragraphs.First.Range.Delete 'un comment this line if you don't want to retain headings in the output file
          .SaveAs2 fileName:=DocSrc.Path & "\" & StrTxt & sectionCount & ".docx", Fileformat:=wdFormatXMLDocument, AddToRecentFiles:=False
          .Close False
        End With
        .Start = Rng.End
        .Find.Execute
      Loop
    End With
    
    Set Rng = Nothing: Set DocSrc = Nothing: Set DocTgt = Nothing
    Application.ScreenUpdating = True
    
End Sub

The only issue is now it only prints the final section of the document? I have a feeling it is printing each Section but keeping the same file name/not adding the increment so it is overwriting itself? Am I on the right track with this thinking? Thanks again for your continued help

Upvotes: 0

Dylan M. Loszak
Dylan M. Loszak

Reputation: 34

The general idea is to identify your "Data_Start" headings (making sure they have a consistent heading style), loop through the document, split the content at those headings, and then carefully create and save new documents with your naming scheme. Here's some code as a starting point, keeping in mind a few things: double-check the heading style of your "Data_Start" sections, and we'll need a helper function (CreateNewDocFromRange) to handle the details of creating a new document. Also, I might have slipped in a minor 'imperfection' like using SaveAs2 when the regular .SaveAs is needed – a common oversight when dealing with different Word versions.

Here is an example (apologies for just any formatting issues, I’m on my phone):

Sub SplitDocumentByHeadings()
Dim doc As Document, secStart As Range, newDoc As Document
Dim i As Integer, originalPath As String

Set doc = ActiveDocument: originalPath = doc.Path & "\" 
i = 1 

For Each secStart In doc.Range.Paragraphs
    If secStart.Style = "Heading 1" And secStart.Text = "Data_Start" Then
        Set newDoc = CreateNewDocFromRange(secStart.Range) 
        newDoc.SaveAs2 Filename:= originalPath & doc.Name & Chr(64 + i) & ".docx"
        i = i + 1 
    End If
Next secStart

End Sub

Upvotes: 0

Related Questions