Reputation: 631
Is there a way to take HTML and import it to excel so that it is formatted as rich text (preferably by using VBA)? Basically, when I paste to an Excel cell, I'm looking to turn this:
<html><p>This is a test. Will this text be <b>bold</b> or <i>italic</i></p></html>
into this:
This is a test. Will this text be bold or italic
Upvotes: 50
Views: 274278
Reputation: 149287
Yes it is possible. In fact let Internet Explorer do the dirty work for you.
MY ASSUMPTIONS
CODE
Sub Sample()
Dim Ie As Object
Set Ie = CreateObject("InternetExplorer.Application")
With Ie
.Visible = False
.Navigate "about:blank"
.document.body.InnerHTML = Sheets("Sheet1").Range("A1").Value
.document.body.createtextrange.execCommand "Copy"
ActiveSheet.Paste Destination:=Sheets("Sheet1").Range("A1")
.Quit
End With
End Sub
SNAPSHOT
Upvotes: 31
Reputation: 855
To put HTML/Word in an Excel Shape and locate it on an Excel Cell:
In this way, even HTML with tables and other stuff does not get split over multiple cells.
private void btnPutHTMLIntoExcelShape_Click(object sender, EventArgs e)
{
var fFile = new FileInfo(@"C:\Temp\temp.html");
StreamWriter SW = fFile.CreateText();
SW.Write(hecNote.DocumentHtml);
SW.Close();
Word.Application wrdApplication;
Word.Document wrdDocument;
wrdApplication = new Word.Application();
wrdApplication.Visible = true;
wrdDocument = wrdApplication.Documents.Add(@"C:\Temp\temp.html");
wrdDocument.ActiveWindow.Selection.WholeStory();
wrdDocument.ActiveWindow.Selection.Copy();
Excel.Application excApplication;
Excel.Workbook excWorkbook;
Excel._Worksheet excWorksheet;
Excel.Range excRange = null;
excApplication = new Excel.Application();
excApplication.Visible = true;
excWorkbook = excApplication.Workbooks.Add(Type.Missing);
excWorksheet = (Excel.Worksheet)excWorkbook.Worksheets.get_Item(1);
excWorksheet.Name = "Work";
excRange = excWorksheet.get_Range("A1");
excRange.Select();
excWorksheet.PasteSpecial("Microsoft Word Document Object");
Excel.Shape O = excWorksheet.Shapes.Item(1);
this.Text = $"{O.Height} x {O.Width}";
((Excel.Range)excWorksheet.Rows[1, Type.Missing]).RowHeight = O.Height;
}
Upvotes: 1
Reputation: 61
Nice! Very slick.
I was disappointed that Excel doesn't let us paste to a merged cell and also pastes results containing a break into successive rows below the "target" cell though, as that meant it simply doesn't work for me. I tried a few tweaks (unmerge/remerge, etc.) but then Excel dropped anything below a break, so that was a dead end.
Ultimately, I came up with a routine that'll handle simple tags and not use the "native" Unicode converter that is causing the issue with merged fields. Hope others find this useful:
Public Sub AddHTMLFormattedText(rngA As Range, strHTML As String, Optional blnShowBadHTMLWarning As Boolean = False)
' Adds converts text formatted with basic HTML tags to formatted text in an Excel cell
' NOTE: Font Sizes not handled perfectly per HTML standard, but I find this method more useful!
Dim strActualText As String, intSrcPos As Integer, intDestPos As Integer, intDestSrcEquiv() As Integer
Dim varyTags As Variant, varTag As Variant, varEndTag As Variant, blnTagMatch As Boolean
Dim intCtr As Integer
Dim intStartPos As Integer, intEndPos As Integer, intActualStartPos As Integer, intActualEndPos As Integer
Dim intFontSizeStartPos As Integer, intFontSizeEndPos As Integer, intFontSize As Integer
varyTags = Array("<b>", "</b>", "<i>", "</i>", "<u>", "</u>", "<sub>", "</sub>", "<sup>", "</sup>")
' Remove unhandled/unneeded tags, convert <br> and <p> tags to line feeds
strHTML = Trim(strHTML)
strHTML = Replace(strHTML, "<html>", "")
strHTML = Replace(strHTML, "</html>", "")
strHTML = Replace(strHTML, "<p>", "")
While LCase(Right$(strHTML, 4)) = "</p>" Or LCase(Right$(strHTML, 4)) = "<br>"
strHTML = Left$(strHTML, Len(strHTML) - 4)
strHTML = Trim(strHTML)
Wend
strHTML = Replace(strHTML, "<br>", vbLf)
strHTML = Replace(strHTML, "</p>", vbLf)
strHTML = Trim(strHTML)
ReDim intDestSrcEquiv(1 To Len(strHTML))
strActualText = ""
intSrcPos = 1
intDestPos = 1
Do While intSrcPos <= Len(strHTML)
blnTagMatch = False
For Each varTag In varyTags
If LCase(Mid$(strHTML, intSrcPos, Len(varTag))) = varTag Then
blnTagMatch = True
intSrcPos = intSrcPos + Len(varTag)
If intSrcPos > Len(strHTML) Then Exit Do
Exit For
End If
Next
If blnTagMatch = False Then
varTag = "<font size"
If LCase(Mid$(strHTML, intSrcPos, Len(varTag))) = varTag Then
blnTagMatch = True
intEndPos = InStr(intSrcPos, strHTML, ">")
intSrcPos = intEndPos + 1
If intSrcPos > Len(strHTML) Then Exit Do
Else
varTag = "</font>"
If LCase(Mid$(strHTML, intSrcPos, Len(varTag))) = varTag Then
blnTagMatch = True
intSrcPos = intSrcPos + Len(varTag)
If intSrcPos > Len(strHTML) Then Exit Do
End If
End If
End If
If blnTagMatch = False Then
strActualText = strActualText & Mid$(strHTML, intSrcPos, 1)
intDestSrcEquiv(intSrcPos) = intDestPos
intDestPos = intDestPos + 1
intSrcPos = intSrcPos + 1
End If
Loop
' Clear any bold/underline/italic/superscript/subscript formatting from cell
rngA.Font.Bold = False
rngA.Font.Underline = False
rngA.Font.Italic = False
rngA.Font.Subscript = False
rngA.Font.Superscript = False
rngA.Value = strActualText
' Now start applying Formats!"
' Start with Font Size first
intSrcPos = 1
intDestPos = 1
Do While intSrcPos <= Len(strHTML)
varTag = "<font size"
If LCase(Mid$(strHTML, intSrcPos, Len(varTag))) = varTag Then
intFontSizeStartPos = InStr(intSrcPos, strHTML, """") + 1
intFontSizeEndPos = InStr(intFontSizeStartPos, strHTML, """") - 1
If intFontSizeEndPos - intFontSizeStartPos <= 3 And intFontSizeEndPos - intFontSizeStartPos > 0 Then
Debug.Print Mid$(strHTML, intFontSizeStartPos, intFontSizeEndPos - intFontSizeStartPos + 1)
If Mid$(strHTML, intFontSizeStartPos, 1) = "+" Then
intFontSizeStartPos = intFontSizeStartPos + 1
intFontSize = 11 + 2 * Mid$(strHTML, intFontSizeStartPos, intFontSizeEndPos - intFontSizeStartPos + 1)
ElseIf Mid$(strHTML, intFontSizeStartPos, 1) = "-" Then
intFontSizeStartPos = intFontSizeStartPos + 1
intFontSize = 11 - 2 * Mid$(strHTML, intFontSizeStartPos, intFontSizeEndPos - intFontSizeStartPos + 1)
Else
intFontSize = Mid$(strHTML, intFontSizeStartPos, intFontSizeEndPos - intFontSizeStartPos + 1)
End If
Else
' Error!
GoTo HTML_Err
End If
intEndPos = InStr(intSrcPos, strHTML, ">")
intSrcPos = intEndPos + 1
intStartPos = intSrcPos
If intSrcPos > Len(strHTML) Then Exit Do
While intDestSrcEquiv(intStartPos) = 0 And intStartPos < Len(strHTML)
intStartPos = intStartPos + 1
Wend
If intStartPos >= Len(strHTML) Then GoTo HTML_Err ' HTML is bad!
varEndTag = "</font>"
intEndPos = InStr(intSrcPos, LCase(strHTML), varEndTag)
If intEndPos = 0 Then GoTo HTML_Err ' HTML is bad!
While intDestSrcEquiv(intEndPos) = 0 And intEndPos > intSrcPos
intEndPos = intEndPos - 1
Wend
If intEndPos > intSrcPos Then
intActualStartPos = intDestSrcEquiv(intStartPos)
intActualEndPos = intDestSrcEquiv(intEndPos)
rngA.Characters(intActualStartPos, intActualEndPos - intActualStartPos + 1) _
.Font.Size = intFontSize
End If
End If
intSrcPos = intSrcPos + 1
Loop
'Now do remaining tags
intSrcPos = 1
intDestPos = 1
Do While intSrcPos <= Len(strHTML)
If intDestSrcEquiv(intSrcPos) = 0 Then
' This must be a Tag!
For intCtr = 0 To UBound(varyTags) Step 2
varTag = varyTags(intCtr)
intStartPos = intSrcPos + Len(varTag)
While intDestSrcEquiv(intStartPos) = 0 And intStartPos < Len(strHTML)
intStartPos = intStartPos + 1
Wend
If intStartPos >= Len(strHTML) Then GoTo HTML_Err ' HTML is bad!
If LCase(Mid$(strHTML, intSrcPos, Len(varTag))) = varTag Then
varEndTag = varyTags(intCtr + 1)
intEndPos = InStr(intSrcPos, LCase(strHTML), varEndTag)
If intEndPos = 0 Then GoTo HTML_Err ' HTML is bad!
While intDestSrcEquiv(intEndPos) = 0 And intEndPos > intSrcPos
intEndPos = intEndPos - 1
Wend
If intEndPos > intSrcPos Then
intActualStartPos = intDestSrcEquiv(intStartPos)
intActualEndPos = intDestSrcEquiv(intEndPos)
With rngA.Characters(intActualStartPos, intActualEndPos - intActualStartPos + 1).Font
If varTag = "<b>" Then
.Bold = True
ElseIf varTag = "<i>" Then
.Italic = True
ElseIf varTag = "<u>" Then
.Underline = True
ElseIf varTag = "<sup>" Then
.Superscript = True
ElseIf varTag = "<sub>" Then
.Subscript = True
End If
End With
End If
intSrcPos = intSrcPos + Len(varTag) - 1
Exit For
End If
Next
End If
intSrcPos = intSrcPos + 1
intDestPos = intDestPos + 1
Loop
Exit_Sub:
Exit Sub
HTML_Err:
' There was an error with the Tags. Show warning if requested.
If blnShowBadHTMLWarning Then
MsgBox "There was an error with the Tags in the HTML file. Could not apply formatting."
End If
End Sub
Note this doesn't care about tag nesting, instead only requiring a close tag for every open tag, and assuming the close tag nearest the opening tag applies to the opening tag. Properly nested tags will work fine, while improperly nested tags will not be rejected and may or may not work.
Upvotes: 5
Reputation: 187
I ran into the same error that BornToCode first identified in the comments of the original solution. Being unfamiliar with Excel and VBA it took me a second to figure out how to implement tiQU's solution. So I'm posting it as a "For Dummies" solution below
Sub Sample()
Dim Ie As Object
Set Ie = CreateObject("InternetExplorer.Application")
With Ie
.Visible = False
.Navigate "about:blank"
.document.body.InnerHTML = Sheets("Sheet1").Range("I2").Value
'update to the cell that contains HTML you want converted
.ExecWB 17, 0
'Select all contents in browser
.ExecWB 12, 2
'Copy them
ActiveSheet.Paste Destination:=Sheets("Sheet1").Range("J2")
'update to cell you want converted HTML pasted in
.Quit
End With
End Sub
Upvotes: 17
Reputation: 33145
You can copy the HTML code to the clipboard and paste special it back as Unicode text. Excel will render the HTML in the cell. Check out this post http://www.dailydoseofexcel.com/archives/2005/02/23/html-in-cells-ii/
The relevant macro code from the post:
Private Sub Worksheet_Change(ByVal Target As Range)
Dim objData As DataObject
Dim sHTML As String
Dim sSelAdd As String
Application.EnableEvents = False
If Target.Cells.Count = 1 Then
If LCase(Left(Target.Text, 6)) = "<html>" Then
Set objData = New DataObject
sHTML = Target.Text
objData.SetText sHTML
objData.PutInClipboard
sSelAdd = Selection.Address
Target.Select
Me.PasteSpecial "Unicode Text"
Me.Range(sSelAdd).Select
End If
End If
Application.EnableEvents = True
End Sub
Upvotes: 11
Reputation: 87
I know this thread is ancient, but after assigning the innerHTML, ExecWB worked for me:
.ExecWB 17, 0
'Select all contents in browser
.ExecWB 12, 2
'Copy them
And then just paste the contents into Excel. Since these methods are prone to runtime errors, but work fine after one or two tries in debug mode, you might have to tell Excel to try again if it runs into an error. I solved this by adding this error handler to the sub, and it works fine:
Sub ApplyHTML()
On Error GoTo ErrorHandler
...
Exit Sub
ErrorHandler:
Resume
'I.e. re-run the line of code that caused the error
Exit Sub
End Sub
Upvotes: 9
Reputation: 2788
If the IE example doesn't work use this one. Anyway this should be faster than starting up an instance of IE.
Here is a complete solution based on
http://www.dailydoseofexcel.com/archives/2005/02/23/html-in-cells-ii/
Note, if your innerHTML is all numbers eg '12345', HTML formatting dosen't fully work in excel as it treats number differently? but add a character eg a trailing space at the end eg. 12345 + "& nbsp;" formats ok.
Sub test()
Cells(1, 1).Value = "<HTML>1<font color=blue>a</font>" & _
"23<font color=red>4</font></HTML>"
Dim rng As Range
Set rng = ActiveSheet.Cells(1, 1)
Worksheet_Change rng, ActiveSheet
End Sub
Private Sub Worksheet_Change(ByVal Target As Range, ByVal sht As Worksheet)
Dim objData As DataObject ' Set a reference to MS Forms 2.0
Dim sHTML As String
Dim sSelAdd As String
Application.EnableEvents = False
If Target.Cells.Count = 1 Then
Set objData = New DataObject
sHTML = Target.Text
objData.SetText sHTML
objData.PutInClipboard
Target.Select
sht.PasteSpecial Format:="Unicode Text"
End If
Application.EnableEvents = True
End Sub
Upvotes: 10