Reputation: 790
I'm parsing some html to translate it into openXML xlsx. I haven't been able to extract a style attribute. I could brute force this with a custom parser, however, I was hoping to use mshtml as much as possible. The source html may have some non-standard formatting. Here are the details:
(below: input, code, and debug output)
input string:
<div id="GLGV" class="GLVG1">
<div class="GLGVOuterRow" ID="GLGV_PRTS_0" style="height:20px;">
<span id="ExtID01_0000" title="Note - N0001" class="ExtID01Label">N0001</span>
<span id="Note01" class="Note01" style="display:inline-block;width:70px;">Area Name</span>
<span id="Main01" class="MainTextAll" style="display:inline-block;height:16px;width:250px;">My new area</span>
<span id="OTLID_0" class="GRPL_Hidden">8270</span>
<span id="OTLParID_0" class="GRPL_Hidden">8269</span>
<span id="PrtTyp_0" class="GRPL_Hidden">NOTE</span>
<span class="FloatClear"></span>
</div>
Asp.net code:
Public Sub TestSample()
Dim wrkListString As String = C.AC("List")
Dim wrkDocument As IHTMLDocument2 = New HTMLDocumentClass()
wrkDocument.write(wrkListString)
wrkDocument.close()
Dim wrkAllElements As IHTMLElementCollection = wrkDocument.body.all
Dim ws As String = ""
Dim wrkType As String = ""
Dim wrkStyle As String = ""
Dim wrkId As String = ""
Dim wrkClass As String = ""
For Each wrkElem In wrkAllElements
wrkType = wrkElem.GetType().ToString
wrkId = wrkElem.id
wrkClass = wrkElem.className
wrkStyle = wrkElem.Style.ToString
ws = wrkType & " , " & wrkId & " , " & wrkClass & " , " & wrkStyle & " , "
Debug.Print(ws)
Next
End Sub
Debug output:
mshtml.HTMLDivElementClass , GLGV , GLVG1 , System.__ComObject ,
mshtml.HTMLDivElementClass , GLGV_PRTS_0 , GLGVOuterRow , System.__ComObject ,
mshtml.HTMLSpanElementClass , ExtID01_0000 , ExtID01Label , System.__ComObject ,
mshtml.HTMLSpanElementClass , Note01 , Note01 , System.__ComObject ,
mshtml.HTMLSpanElementClass , Main01 , MainTextAll , System.__ComObject ,
mshtml.HTMLSpanElementClass , OTLID_0 , GRPL_Hidden , System.__ComObject ,
mshtml.HTMLSpanElementClass , OTLParID_0 , GRPL_Hidden , System.__ComObject ,
mshtml.HTMLSpanElementClass , PrtTyp_0 , GRPL_Hidden , System.__ComObject ,
mshtml.HTMLSpanElementClass , , FloatClear , System.__ComObject ,
I don't see the detailed style from the span id="Main01", only "System.__ComObject"
Any help with how to get the detailed inline style string would be appreciated. Thanks!
Upvotes: 2
Views: 2703
Reputation: 1546
The Style property of wrkElem is an IHTMLStyle object so you'll want to use the cssText property of the IHTMLStyle object to retrieve the style's text.
So now to implement this information, change this:
wrkStyle = wrkElem.Style.ToString
To this:
wrkStyle = wrkElem.Style.Csstext
Upvotes: 1