Reputation: 3
Pretty new to the world of VBA, but have been asked if I can make some QoL improvements for routine tasks at work. One task I am struggling to get VBA working for is separating out a word document based on headings? For example we would receive a document along the following lines:
Info Line 1 Info Line 2
Data_Start
-multiple lines of data-
Data_Start
-multiple lines of data-
And so on. What I would like to do is run a Macro that would separate out each "Data_Start" section into a new document - ideally with the same Document Title suffixed with A, B, C etc to denote each new section (or numerically if alphabetically is an issue). Another wishlist item is to have those documents save into the same folder as the original.
I found some short vba code that separates out based on section breaks, with the aim of using it as a jumping off point to build on. However my limited knowledge of VBA is leading to roadblocks in getting things working correctly and I am running into issues with documents saving correctly (place and/or filename).
I understand this a big request but any help would be greatly appreciated!
Upvotes: 0
Views: 856
Reputation: 7860
To split a document at headings you first need to find the headings. As Word's Find
function can do this we don't need to resort to crawling through the document paragraph by paragraph.
When coding in Word it is desirable to avoid using the Selection
object. This is because using Selection
will cause the screen to be redrawn each time it changes, slowing down the code. Instead we work with Range objects.
When Find
returns a match .Execute
will return a value of True and the Range is redefined to the range of the match.
To get the full text below the heading we can use one of the predefined bookmarks in Word, \HeadingLevel.
To avoid using the clipboard, which slows things down when used within a loop, we make use of the FormattedText property of the range.
Sub SplitDocumentAtHeadings()
Dim findRange As Range: Set findRange = ActiveDocument.Content
Dim newDoc As Document
Dim index As Long
Dim saveRange As Range
Dim savePath As String: savePath = ActiveDocument.Path & "\"
Dim saveName As String
With findRange
With .Find
.ClearFormatting
.Style = ActiveDocument.Styles(wdStyleHeading1)
.Forward = True
.Format = True
.Wrap = wdFindStop
End With
Do While .Find.Execute
index = index + 1
'use the heading text as the save name
saveName = .Text & index & ".docx"
'get the heading and data that follows it
Set saveRange = .Duplicate
'saveRange.Move wdParagraph
Set saveRange = saveRange.GoTo(What:=wdGoToBookmark, Name:="\HeadingLevel")
Set newDoc = Documents.Add
'use formatted text to avoid using the clipboard
newDoc.Content.FormattedText = saveRange.FormattedText
newDoc.SaveAs2 FileName:=savePath & saveName, Fileformat:=WdSaveFormat.wdFormatXMLDocument
.Collapse wdCollapseEnd
Loop
End With
End Sub
EDIT:
In response to OP's answer. Please read comments in code to understand where mistakes were made.
Sub SplitDocOnHeading1ToDocxWithHeadingInOutput()
Application.ScreenUpdating = False
Dim Rng As Range
Dim DocSrc As Document
Dim DocTgt As Document
'incorrectly spelt variable name
'Dim sectionCoun As Integer
Dim sectionCount As Integer
Dim i As Long, StrTxt As String: Const StrNoChr As String = """*/\:?|"
Set DocSrc = ActiveDocument
sectionCount = 0
With DocSrc.Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Format = True
.Forward = True
.Text = "Data_Start"
.Style = wdStyleHeading1
.Replacement.Text = ""
.Wrap = wdFindStop
'.Execute
End With
'Find.Found is unreliable. Use Execute instead
'Do While .Find.Found
Do While .Find.Execute
'added incrementing of the section count
sectionCount = sectionCount + 1
Set Rng = .Paragraphs(1).Range
Set Rng = Rng.GoTo(What:=wdGoToBookmark, Name:="\HeadingLevel")
Set DocTgt = Documents.Add(DocSrc.AttachedTemplate.FullName)
With DocTgt
'Not necessary as already set at the beginning
'Application.ScreenUpdating = False
.Range.FormattedText = Rng.FormattedText
StrTxt = Split(.Paragraphs.First.Range.Text, vbCr)(0)
' Strip out illegal characters
For i = 1 To Len(StrNoChr)
StrTxt = Replace(StrTxt, Mid(StrNoChr, i, 1), "_")
Next
'.Paragraphs.First.Range.Delete 'un comment this line if you don't want to retain headings in the output file
'variable sectionCount was declared as sectionCoun and is not incremented in the code
.SaveAs2 FileName:=DocSrc.Path & "\" & StrTxt & sectionCount & ".docx", Fileformat:=wdFormatXMLDocument, AddToRecentFiles:=False
.Close False
End With
.Start = Rng.End
'Not needed
'.Find.Execute
Loop
End With
Set Rng = Nothing: Set DocSrc = Nothing: Set DocTgt = Nothing
Application.ScreenUpdating = True
End Sub
Upvotes: 1
Reputation: 3
@timothyrylatt I have gotten pretty close with this, which is in the same vein and does borrow some elements:
Sub SplitDocOnHeading1ToDocxWithHeadingInOutput()
Application.ScreenUpdating = False
Dim Rng As Range
Dim DocSrc As Document
Dim DocTgt As Document
Dim sectionCoun As Integer
Dim i As Long, StrTxt As String: Const StrNoChr As String = """*/\:?|"
Set DocSrc = ActiveDocument
sectionCount = 0
With DocSrc.Range
With .Find
.ClearFormatting
.Replacement.ClearFormatting
.Format = True
.Forward = True
.Text = "Data_Start"
.Style = wdStyleHeading1
.Replacement.Text = ""
.Wrap = wdFindStop
.Execute
End With
Do While .Find.Found
Set Rng = .Paragraphs(1).Range
Set Rng = Rng.GoTo(What:=wdGoToBookmark, Name:="\HeadingLevel")
Set DocTgt = Documents.Add(DocSrc.AttachedTemplate.FullName)
With DocTgt
Application.ScreenUpdating = False
.Range.FormattedText = Rng.FormattedText
StrTxt = Split(.Paragraphs.First.Range.Text, vbCr)(0)
' Strip out illegal characters
For i = 1 To Len(StrNoChr)
StrTxt = Replace(StrTxt, Mid(StrNoChr, i, 1), "_")
Next
'.Paragraphs.First.Range.Delete 'un comment this line if you don't want to retain headings in the output file
.SaveAs2 fileName:=DocSrc.Path & "\" & StrTxt & sectionCount & ".docx", Fileformat:=wdFormatXMLDocument, AddToRecentFiles:=False
.Close False
End With
.Start = Rng.End
.Find.Execute
Loop
End With
Set Rng = Nothing: Set DocSrc = Nothing: Set DocTgt = Nothing
Application.ScreenUpdating = True
End Sub
The only issue is now it only prints the final section of the document? I have a feeling it is printing each Section but keeping the same file name/not adding the increment so it is overwriting itself? Am I on the right track with this thinking? Thanks again for your continued help
Upvotes: 0
Reputation: 34
The general idea is to identify your "Data_Start" headings (making sure they have a consistent heading style), loop through the document, split the content at those headings, and then carefully create and save new documents with your naming scheme. Here's some code as a starting point, keeping in mind a few things: double-check the heading style of your "Data_Start" sections, and we'll need a helper function (CreateNewDocFromRange) to handle the details of creating a new document. Also, I might have slipped in a minor 'imperfection' like using SaveAs2 when the regular .SaveAs is needed – a common oversight when dealing with different Word versions.
Here is an example (apologies for just any formatting issues, I’m on my phone):
Sub SplitDocumentByHeadings()
Dim doc As Document, secStart As Range, newDoc As Document
Dim i As Integer, originalPath As String
Set doc = ActiveDocument: originalPath = doc.Path & "\"
i = 1
For Each secStart In doc.Range.Paragraphs
If secStart.Style = "Heading 1" And secStart.Text = "Data_Start" Then
Set newDoc = CreateNewDocFromRange(secStart.Range)
newDoc.SaveAs2 Filename:= originalPath & doc.Name & Chr(64 + i) & ".docx"
i = i + 1
End If
Next secStart
End Sub
Upvotes: 0