Reputation: 1832
Example of corrupted text:
Prokofiev, Sergey, 1891-1953. | Simfonii︠a︡-kont︠s︡ert
to
Prokofiev, Sergey, 1891-1953. | Simfonii?a?-kont?s?ert
So I'm using a vba macro to transfer what was originally a google spreadsheet to xml via excel. Although I tried telling excel in advanced options to save in utf-8, and although the xml is being printed in utf 8, obviously something is incorrect. This is the vba, which I believe I got from stackoverflow months ago in the first place:
Sub MakeXML(iCaptionRow As Integer, iDataStartRow As Integer, sOutputFileName As String)
Dim Q As String
Q = Chr$(34)
Dim sXML As String
sXML = "<?xml version=" & Q & "1.0" & Q & " encoding=" & Q & "UTF-8" & Q & "?>"
sXML = sXML & "<rows>"
''--determine count of columns
Dim iColCount As Integer
iColCount = 1
While Trim$(Cells(iCaptionRow, iColCount)) > ""
iColCount = iColCount + 1
Wend
Dim iRow As Integer
iRow = iDataStartRow
While Cells(iRow, 1) > ""
sXML = sXML & "<row id=" & Q & iRow & Q & ">"
For icol = 1 To iColCount - 1
sXML = sXML & "<" & Trim$(Cells(iCaptionRow, icol)) & ">"
sXML = sXML & Trim$(Cells(iRow, icol))
sXML = sXML & "</" & Trim$(Cells(iCaptionRow, icol)) & ">"
Next
sXML = sXML & "</row>"
iRow = iRow + 1
Wend
sXML = sXML & "</rows>"
Dim nDestFile As Integer, sText As String
''Close any open text files
Close
''Get the number of the next free text file
nDestFile = FreeFile
''Write the entire file to sText
Open sOutputFileName For Output As #nDestFile
Print #nDestFile, sXML
Close
End Sub
Sub test()
MakeXML 1, 2, "C:\Users\Adam Horvath\Documents\~CODE\prokooutputtitleUTF8.xml"
End Sub
I think that the solution involves writing to the xml file in a different way but what that way is, not sure.
Upvotes: 0
Views: 2748
Reputation: 107567
Consider using VBA's MSXML object to build your XML document and not concatenate string values to build XML, even avoiding a text file dump. In this approach, you have procedures to createElement()
, creatAttribute()
, appendChild()
, and xmldoc.Save()
. And specifically for your needs, the createProcessingInstruction()
allowing you to specify encoding. Do note though the standard is always version as 1.0 and encoding as UTF-8. So processing instruction here may be redundant.
Right now, your text file dumps with UTF-8 specified but may not be a full XML format encoding but the default ANSI text format. In fact, you could specify anything and text dump would comply but would fail using an XML object.
Before the ending, an XSLT Identity Transform is added to pretty print the output with line breaks and indentation to avoid the one line raw output of the XML document. You will notice such an XSLT is a string representation but is load to a proper XML document. Otherwise, you can load XSLT externally as .xsl file (which by the way is a well-formed .xml).
Sub MakeXML(iCaptionRow As Integer, iDataStartRow As Integer, sOutputFileName As String)
' REFERENCE: Microsoft XML V6.0
Dim doc As New MSXML2.DOMDocument60, xslDoc As New MSXML2.DOMDocument60, newDoc As New MSXML2.DOMDocument60
Dim pi As MSXML2.IXMLDOMProcessingInstruction
Dim root As IXMLDOMElement, rowNode As IXMLDOMElement, loopNode As IXMLDOMElement
Dim idAttrib As IXMLDOMAttribute
' PROCESSING INSTRUCTION
Set pi = doc.createProcessingInstruction("xml", " version=""1.0"" encoding=""UTF-8""")
doc.appendChild pi
' DECLARE XML DOC OBJECT
Set root = doc.createElement("rows")
doc.appendChild root
''--determine count of columns
Dim iColCount As Integer
iColCount = Cells(7, Columns.Count).End(xlToLeft).Column
Dim iRow As Integer
iRow = iDataStartRow
Dim icol As Integer
While Cells(iRow, 1) > ""
' ROW NODE
Set rowNode = doc.createElement("row")
root.appendChild rowNode
' ID ATTRIBUTE
Set idAttrib = doc.createAttribute("id")
idAttrib.Value = iRow
rowNode.setAttributeNode idAttrib
' LOOP NODE
For icol = 1 To iColCount - 1
Set loopNode = doc.createElement(Trim$(Cells(iCaptionRow, icol)))
loopNode.Text = Trim$(Cells(iRow, icol))
rowNode.appendChild loopNode
Next icol
iRow = iRow + 1
Wend
' PRETTY PRINT RAW OUTPUT
xslDoc.LoadXML "<?xml version=" & Chr(34) & "1.0" & Chr(34) & "?>" _
& "<xsl:stylesheet version=" & Chr(34) & "1.0" & Chr(34) _
& " xmlns:xsl=" & Chr(34) & "http://www.w3.org/1999/XSL/Transform" & Chr(34) & ">" _
& "<xsl:strip-space elements=" & Chr(34) & "*" & Chr(34) & " />" _
& "<xsl:output method=" & Chr(34) & "xml" & Chr(34) & " indent=" & Chr(34) & "yes" & Chr(34) & "" _
& " encoding=" & Chr(34) & "UTF-8" & Chr(34) & "/>" _
& " <xsl:template match=" & Chr(34) & "node() | @*" & Chr(34) & ">" _
& " <xsl:copy>" _
& " <xsl:apply-templates select=" & Chr(34) & "node() | @*" & Chr(34) & " />" _
& " </xsl:copy>" _
& " </xsl:template>" _
& "</xsl:stylesheet>"
xslDoc.async = False
doc.transformNodeToObject xslDoc, newDoc
newDoc.Save sOutputFileName
End Sub
Upvotes: 1