Reputation: 4561
I am trying to figure out a way using VBA to parse a Word Document so I can put its contents into an array. For this example I have two companies in a Word Document (as seen below the code) and I want to put the fields into an array.
Public Sub ParseCompanies()
Dim Company_Array(1 To 2) As String 'stores individual company fields
Dim Companies_Array() 'array for all companies
Dim oWord As Object, oDoc As Object
Set oWord = CreateObject("Word.Application")
Set oDoc = oWord.Documents.Open("C:/Temp/test.docx", Visible:=True)
Dim singleLine
Dim lineText As String
'need to rewrite this section
For Each singleLine In oDoc.Paragraphs
lineText = singleLine.Range.Text
Debug.Print lineText
Next singleLine
End Sub
Word file contents cut and pasted onto Stack Overflow:
Company: Aladin Carpets
Product: Magic Carpets
Company: Aerials Seashells
Product: Seashells
The way the current script runs can be seen below in the VBA Debugger Output
Is there an efficient way to do this? A way to delimit the lines or section splitters in the word document in order to parse the individual companies?
Upvotes: 1
Views: 298
Reputation: 2800
Solution:
If the output is as stated -I copied your data but I get different results-, this should work, if not, just adjust the element being saved in the array:
Public Sub ParseCompanies()
Dim Products_Array() As String 'stores individual company fields
Dim Companies_Array() As String 'array for all companies
Dim CounterElements As Long: CounterElements = 1
Dim CounterParagraphs As Long
Dim oWord As Object, oDoc As Object
Set oWord = CreateObject("Word.Application")
On Error GoTo Err01ParseCompanies
Set oDoc = oWord.Documents.Open("C:\Users\lz630z\Desktop\Company.docx", visible:=True)
Dim singleLine
Dim lineText As String
'need to rewrite this section
For CounterParagraphs = 1 To oDoc.Paragraphs.Count
If InStr(oDoc.Paragraphs(CounterParagraphs).Range.Text, "Company") Then ReDim Preserve Companies_Array(CounterElements): Companies_Array(CounterElements) = oDoc.Paragraphs(CounterParagraphs + 2)
If InStr(oDoc.Paragraphs(CounterParagraphs).Range.Text, "Product") Then ReDim Preserve Products_Array(CounterElements): Products_Array(CounterElements) = oDoc.Paragraphs(CounterParagraphs + 2): CounterElements = CounterElements + 1
Next CounterParagraphs
If 1 = 2 Then ' 99. If error
Err01ParseCompanies:
MsgBox "Word Error", vbCritical
End If '99. If error
Set oDoc = Nothing
Set oWord = Nothing
End Sub
Summary of changes/suggestions
For each
won't work here, since according to the screenshot is going to be 2 rows after it found the first result, it's better to have everything controlled in this scenario and save the elements in the array accordingly, changed a For/To
approach (I assumed you arrays meant to be as defined now).
Whenever you are referring to one, for the size the other will be accordingly.
IG: Companies_Array(1) will be Aladin Carpets and Products_Array(1) will be Magic Carpets
Upvotes: 2
Reputation: 3279
If your word document items are truly paragraph delimited, you could use the Split
method to fill your array and then loop through it to manipulate the data. For example, this just fills the array and prints the elements to the immediate window:
Public Sub ParseCompanies()
Dim wordList() As String
Dim i As Long
Dim oWord As Word.Application
Dim oDoc As Word.Document
Set oWord = CreateObject("Word.Application")
Set oDoc = oWord.Documents.Open("C:\Users\test\Desktop\Company.docx", Visible:=False)
wordList = Split(oDoc.Content.Text, vbCr) 'split using carriage return (paragraphs)
For i = 0 To UBound(wordList, 1)
Debug.Print wordList(i)
Next i
oWord.Quit
End Sub
I can't speak to the performance of this method on a large file, so it may require testing before this can be considered a viable option.
Upvotes: 0