Reputation: 7095
This is an extension from Regular Expressions in Excel VBA
I have come up with additional matches that I believe are out of scope from my original question. Here is my existing code:
Sub ImportFromDTD()
Dim sDTDFile As Variant
Dim ffile As Long
Dim sLines() As String
Dim i As Long
Dim Reg1 As RegExp
Dim M1 As MatchCollection
Dim M As Match
Dim myRange As Range
Set Reg1 = New RegExp
ffile = FreeFile
sDTDFile = Application.GetOpenFilename("DTD Files,*.XML", , _
"Browse for file to be imported")
If sDTDFile = False Then Exit Sub '(user cancelled import file browser)
Open sDTDFile For Input Access Read As #ffile
Lines = Split(Input$(LOF(ffile), #ffile), vbNewLine)
Close #ffile
Cells(1, 2) = "From DTD"
J = 2
For i = 0 To UBound(Lines)
'Debug.Print "Line"; i; "="; Lines(i)
With Reg1
.Pattern = "\<\!ELEMENT\s+(\w+)\s+\((#\w+|(\w+)\+)\)\s+\>"
.Global = True
.MultiLine = True
.IgnoreCase = False
End With
If Reg1.Test(Lines(i)) Then
Set M1 = Reg1.Execute(Lines(i))
For Each M In M1
sExtract = M.SubMatches(2)
If Len(sExtract) = 0 Then sExtract = M.SubMatches(0)
sExtract = Replace(sExtract, Chr(13), "")
Cells(J, 2) = sExtract
J = J + 1
'Debug.Print sExtract
Next M
End If
Next i
Set Reg1 = Nothing
End Sub
Here is an excerpt from my file:
<!ELEMENT ProductType (#PCDATA) >
<!ELEMENT Invoices (InvoiceDetails+) >
<!ELEMENT Deal (DealNumber,DealType,DealParties) >
<!ELEMENT DealParty (PartyType,CustomerID,CustomerName,CentralCustomerID?,
LiabilityPercent,AgentInd,FacilityNo?,PartyReferenceNo?,
PartyAddlReferenceNo?,PartyEffectiveDate?,FeeRate?,ChargeType?) >
<!ELEMENT Deals (Deal*) >
currently, I'm matching:
extract ProductType
<!ELEMENT ProductType (#PCDATA) >
extract InvoiceDetails
<!ELEMENT Invoices (InvoiceDetails+) >
I also need to extract the following:
Extract Deal
<!ELEMENT Deal (DealNumber,DealType,DealParties) >
Extract DealParty the ?,CR are throwing me off
<!ELEMENT DealParty (PartyType,CustomerID,CustomerName,CentralCustomerID?,
LiabilityPercent,AgentInd,FacilityNo?,PartyReferenceNo?,
PartyAddlReferenceNo?,PartyEffectiveDate?,FeeRate?,ChargeType?) >
Extract Deal
<!ELEMENT Deals (Deal*) >
Upvotes: 1
Views: 197
Reputation: 70923
Maybe I am missing something, but (sorry, I don't have VBA at hand now, so this is VBS, you will have to adapt something)
Option Explicit
Dim fileContents
fileContents = WScript.CreateObject("Scripting.FileSystemObject").OpenTextFile("input.xml").ReadAll
Dim matches
With New RegExp
.Multiline = True
.IgnoreCase = False
.Global = True
.Pattern = "<!ELEMENT\s+([^\s>]+)\s+([^>]*)\s*>"
Set matches = .Execute( fileContents )
End With
Dim match
For Each match in matches
WScript.Echo match.Submatches(0)
WScript.Echo match.Submatches(1)
WScript.Echo "---------------------------------------"
Next
As I see it, your main problem is trying to match a multiline regular expression against a separate set of lines one line at a time instead of matching it against the full text.
Upvotes: 1