J. G.
J. G.

Reputation: 1832

VBA macro is creating "utf-8" xml,but not really

Example of corrupted text:

Prokofiev, Sergey, 1891-1953. | Simfonii︠a︡-kont︠s︡ert

to

Prokofiev, Sergey, 1891-1953. | Simfonii?a?-kont?s?ert

So I'm using a vba macro to transfer what was originally a google spreadsheet to xml via excel. Although I tried telling excel in advanced options to save in utf-8, and although the xml is being printed in utf 8, obviously something is incorrect. This is the vba, which I believe I got from stackoverflow months ago in the first place:

Sub MakeXML(iCaptionRow As Integer, iDataStartRow As Integer, sOutputFileName As String)
    Dim Q As String
    Q = Chr$(34)

    Dim sXML As String

    sXML = "<?xml version=" & Q & "1.0" & Q & " encoding=" & Q & "UTF-8" & Q & "?>"
    sXML = sXML & "<rows>"


    ''--determine count of columns
    Dim iColCount As Integer
    iColCount = 1
    While Trim$(Cells(iCaptionRow, iColCount)) > ""
        iColCount = iColCount + 1
    Wend

    Dim iRow As Integer
    iRow = iDataStartRow

    While Cells(iRow, 1) > ""
        sXML = sXML & "<row id=" & Q & iRow & Q & ">"

        For icol = 1 To iColCount - 1
           sXML = sXML & "<" & Trim$(Cells(iCaptionRow, icol)) & ">"
           sXML = sXML & Trim$(Cells(iRow, icol))
           sXML = sXML & "</" & Trim$(Cells(iCaptionRow, icol)) & ">"
        Next

        sXML = sXML & "</row>"
        iRow = iRow + 1
    Wend
    sXML = sXML & "</rows>"

    Dim nDestFile As Integer, sText As String

    ''Close any open text files
    Close

    ''Get the number of the next free text file
    nDestFile = FreeFile

    ''Write the entire file to sText
    Open sOutputFileName For Output As #nDestFile
    Print #nDestFile, sXML
    Close
End Sub

Sub test()
    MakeXML 1, 2, "C:\Users\Adam Horvath\Documents\~CODE\prokooutputtitleUTF8.xml"
End Sub

I think that the solution involves writing to the xml file in a different way but what that way is, not sure.

Upvotes: 0

Views: 2748

Answers (1)

Parfait
Parfait

Reputation: 107567

Consider using VBA's MSXML object to build your XML document and not concatenate string values to build XML, even avoiding a text file dump. In this approach, you have procedures to createElement(), creatAttribute(), appendChild(), and xmldoc.Save(). And specifically for your needs, the createProcessingInstruction() allowing you to specify encoding. Do note though the standard is always version as 1.0 and encoding as UTF-8. So processing instruction here may be redundant.

Right now, your text file dumps with UTF-8 specified but may not be a full XML format encoding but the default ANSI text format. In fact, you could specify anything and text dump would comply but would fail using an XML object.

Before the ending, an XSLT Identity Transform is added to pretty print the output with line breaks and indentation to avoid the one line raw output of the XML document. You will notice such an XSLT is a string representation but is load to a proper XML document. Otherwise, you can load XSLT externally as .xsl file (which by the way is a well-formed .xml).

Sub MakeXML(iCaptionRow As Integer, iDataStartRow As Integer, sOutputFileName As String)
    ' REFERENCE: Microsoft XML V6.0
    Dim doc As New MSXML2.DOMDocument60, xslDoc As New MSXML2.DOMDocument60, newDoc As New MSXML2.DOMDocument60
    Dim pi As MSXML2.IXMLDOMProcessingInstruction
    Dim root As IXMLDOMElement, rowNode As IXMLDOMElement, loopNode As IXMLDOMElement
    Dim idAttrib As IXMLDOMAttribute

    ' PROCESSING INSTRUCTION
    Set pi = doc.createProcessingInstruction("xml", " version=""1.0"" encoding=""UTF-8""")
    doc.appendChild pi

    ' DECLARE XML DOC OBJECT
    Set root = doc.createElement("rows")
    doc.appendChild root

    ''--determine count of columns
    Dim iColCount As Integer
    iColCount = Cells(7, Columns.Count).End(xlToLeft).Column

    Dim iRow As Integer
    iRow = iDataStartRow

    Dim icol As Integer

    While Cells(iRow, 1) > ""

        ' ROW NODE
        Set rowNode = doc.createElement("row")
        root.appendChild rowNode

        ' ID ATTRIBUTE
        Set idAttrib = doc.createAttribute("id")
        idAttrib.Value = iRow
        rowNode.setAttributeNode idAttrib

        ' LOOP NODE
        For icol = 1 To iColCount - 1
            Set loopNode = doc.createElement(Trim$(Cells(iCaptionRow, icol)))
            loopNode.Text = Trim$(Cells(iRow, icol))
            rowNode.appendChild loopNode
        Next icol

        iRow = iRow + 1
    Wend

    ' PRETTY PRINT RAW OUTPUT
    xslDoc.LoadXML "<?xml version=" & Chr(34) & "1.0" & Chr(34) & "?>" _
            & "<xsl:stylesheet version=" & Chr(34) & "1.0" & Chr(34) _
            & "                xmlns:xsl=" & Chr(34) & "http://www.w3.org/1999/XSL/Transform" & Chr(34) & ">" _
            & "<xsl:strip-space elements=" & Chr(34) & "*" & Chr(34) & " />" _
            & "<xsl:output method=" & Chr(34) & "xml" & Chr(34) & " indent=" & Chr(34) & "yes" & Chr(34) & "" _
            & "            encoding=" & Chr(34) & "UTF-8" & Chr(34) & "/>" _
            & " <xsl:template match=" & Chr(34) & "node() | @*" & Chr(34) & ">" _
            & "  <xsl:copy>" _
            & "   <xsl:apply-templates select=" & Chr(34) & "node() | @*" & Chr(34) & " />" _
            & "  </xsl:copy>" _
            & " </xsl:template>" _
            & "</xsl:stylesheet>"

    xslDoc.async = False
    doc.transformNodeToObject xslDoc, newDoc
    newDoc.Save sOutputFileName

End Sub

Upvotes: 1

Related Questions