kaless
kaless

Reputation: 3

How to read/interpret html tags in Excel within a single cell?

So here's my problem, I got a metric ton of html code in an Excel colunm. Something like that:

HTML in Excel cell

<p>Alerte suite &#224; 1 vrt de EUR XXXXXX en provenance du YYYYY.</p>
<p>&#160;</p>
<p>Thing</p>
<p>b&#233;n&#233;ficiaire : Someone</p>
<p>&#160;</p>
<p>Flux inf&#233;rieur &#224; ZZZZZ EUR, ne n&#233;cessitant pas d'investigation.<br /> D&#233;cision du 5/7/2012</p>

So i wanted to treat it, in way that the info will stay in one cell. In my serch I saw this post Stack Question: Read html in Exel

But it was old, and the responce of @BornToCode from 2014, recommend to use a macro to automaticly copy/paste, however it turns the information into many cell.

On my side i use the following macro to "sorta" fix my issue

Code VBA

Sub suppHTML()
'Selection As Range For Each cell In Selection
cell.Select
Call supphtmlinCell
Next cell
End Sub

Sub supphtmlinCell()
Dim strPattern0 As String: strPattern0 = "</p>"
Dim strReplace0 As String: strReplace0 = vbNewLine
Dim regEx0 As New RegExp
Dim strInput0 As String

Dim strPattern As String: strPattern = "<.*?>"
Dim strReplace As String: strReplace = " "
Dim regEx As New RegExp
Dim strInput As String

Dim strPattern1 As String: strPattern1 = "&#160;"
Dim strReplace1 As String: strReplace1 = " "
Dim regEx1 As New RegExp
Dim strInput1 As String

Dim strPattern2 As String: strPattern2 = "&#233;"
Dim strReplace2 As String: strReplace2 = "é"
Dim regEx2 As New RegExp
Dim strInput2 As String

Dim strPattern3 As String: strPattern3 = "&#232;"
Dim strReplace3 As String: strReplace3 = "è"
Dim regEx3 As New RegExp
Dim strInput3 As String

Dim strPattern4 As String: strPattern4 = "&#231;"
Dim strReplace4 As String: strReplace4 = "ç"
Dim regEx4 As New RegExp
Dim strInput4 As String

Dim strPattern5 As String: strPattern5 = "&#235;"
Dim strReplace5 As String: strReplace5 = "ë"
Dim regEx5 As New RegExp
Dim strInput5 As String

Dim strPattern6 As String: strPattern6 = "&#224;"
Dim strReplace6 As String: strReplace6 = "à"
Dim regEx6 As New RegExp
Dim strInput6 As String

Dim strPattern7 As String: strPattern7 = "&gt;"
Dim strReplace7 As String: strReplace7 = ">"
Dim regEx7 As New RegExp
Dim strInput7 As String

Dim strPattern8 As String: strPattern8 = "&lt;"
Dim strReplace8 As String: strReplace8 = "<"
Dim regEx8 As New RegExp
Dim strInput8 As String

Dim strPattern9 As String: strPattern9 = "&amp;"
Dim strReplace9 As String: strReplace9 = "&"
Dim regEx9 As New RegExp
Dim strInput9 As String


    If strPattern0 <> "" Then
        strInput0 = ActiveCell.Offset(0, 0).Value


        With regEx0
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern0
        End With

        If regEx0.Test(strInput0) Then
            'MsgBox (regEx.Replace(strInput, strReplace))
            ActiveCell.Offset(0, 0).Value = regEx0.Replace(strInput0, strReplace0)
        Else
            'MsgBox ("Not matched")
        End If
    End If


'ActiveCell.Offset(0, 0).Value = 2
If strPattern <> "" Then
        strInput = ActiveCell.Offset(0, 0).Value
        strReplace = ""

        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With

        If regEx.Test(strInput) Then
            'MsgBox (regEx.Replace(strInput, strReplace))
            ActiveCell.Offset(0, 0).Value = regEx.Replace(strInput, strReplace)
        Else
            'MsgBox ("Not matched")
        End If
    End If


    If strPattern1 <> "" Then
        strInput1 = ActiveCell.Offset(0, 0).Value


        With regEx1
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern1
        End With

        If regEx1.Test(strInput1) Then
            'MsgBox (regEx.Replace(strInput, strReplace))
            ActiveCell.Offset(0, 0).Value = regEx1.Replace(strInput1, strReplace1)
        Else
            'MsgBox ("Not matched")
        End If
    End If





    If strPattern2 <> "" Then
        strInput2 = ActiveCell.Offset(0, 0).Value


        With regEx2
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern2
        End With

        If regEx2.Test(strInput2) Then
            'MsgBox (regEx.Replace(strInput, strReplace))
            ActiveCell.Offset(0, 0).Value = regEx2.Replace(strInput2, strReplace2)
        Else
            'MsgBox ("Not matched")
        End If
    End If



    If strPattern3 <> "" Then
        strInput3 = ActiveCell.Offset(0, 0).Value


        With regEx3
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern3
        End With

        If regEx3.Test(strInput3) Then
            'MsgBox (regEx.Replace(strInput, strReplace))
            ActiveCell.Offset(0, 0).Value = regEx3.Replace(strInput3, strReplace3)
        Else
            'MsgBox ("Not matched")
        End If
    End If


    If strPattern4 <> "" Then
        strInput4 = ActiveCell.Offset(0, 0).Value


        With regEx4
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern4
        End With

        If regEx4.Test(strInput4) Then
            'MsgBox (regEx.Replace(strInput, strReplace))
            ActiveCell.Offset(0, 0).Value = regEx4.Replace(strInput4, strReplace4)
        Else
            'MsgBox ("Not matched")
        End If
    End If


    If strPattern5 <> "" Then
        strInput5 = ActiveCell.Offset(0, 0).Value


        With regEx5
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern5
        End With

        If regEx5.Test(strInput5) Then
            'MsgBox (regEx.Replace(strInput, strReplace))
            ActiveCell.Offset(0, 0).Value = regEx5.Replace(strInput5, strReplace5)
        Else
            'MsgBox ("Not matched")
        End If
    End If


    If strPattern6 <> "" Then
        strInput6 = ActiveCell.Offset(0, 0).Value


        With regEx6
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern6
        End With

        If regEx6.Test(strInput6) Then
            'MsgBox (regEx.Replace(strInput, strReplace))
            ActiveCell.Offset(0, 0).Value = regEx6.Replace(strInput6, strReplace6)
        Else
            'MsgBox ("Not matched")
        End If
    End If

    If strPattern7 <> "" Then
        strInput7 = ActiveCell.Offset(0, 0).Value


        With regEx7
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern7
        End With

        If regEx7.Test(strInput7) Then
            'MsgBox (regEx.Replace(strInput, strReplace))
            ActiveCell.Offset(0, 0).Value = regEx7.Replace(strInput7, strReplace7)
        Else
            'MsgBox ("Not matched")
        End If
    End If

    If strPattern8 <> "" Then
        strInput8 = ActiveCell.Offset(0, 0).Value


        With regEx8
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern8
        End With

        If regEx8.Test(strInput8) Then
            'MsgBox (regEx.Replace(strInput, strReplace))
            ActiveCell.Offset(0, 0).Value = regEx8.Replace(strInput8, strReplace8)
        Else
            'MsgBox ("Not matched")
        End If
    End If

    If strPattern9 <> "" Then
        strInput9 = ActiveCell.Offset(0, 0).Value


        With regEx9
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern9
        End With

        If regEx9.Test(strInput9) Then
            'MsgBox (regEx.Replace(strInput, strReplace))
            ActiveCell.Offset(0, 0).Value = regEx9.Replace(strInput9, strReplace9)
        Else
            'MsgBox ("Not matched")
        End If
    End If     
End Sub

So my solution is to find string i don't like and replace it by some other string that I prefer. But as you can all see, this solution is less than optimal. For instance now I'm still missing the way to interpret "à", "â" or "ô". And I'll need to edit the macro for every new string that I want to deal with.

So here I am looking for a better solution than mine, and I'm fairly confident a cleaner solution exist since Excel can interpret HTML through copy/paste.

Upvotes: 0

Views: 635

Answers (2)

kaless
kaless

Reputation: 3

Here the ajustement of Rik Sportel code, in case someone arrive on that page with the same issue as mine:

Sub supphtml()

u = Selection.Address
For Each cel In Range(u)

    With CreateObject("htmlfile")
        .Open
        .write cel.Value
        .Close
        cel.Value = .body.outerText
    End With

Next cel
End Sub

This will transform all cells that were selected before running the macro

Upvotes: 0

Rik Sportel
Rik Sportel

Reputation: 2679

How about just using the HTML file class?

Let's say your sample strings are in Range A1:A6 on Worksheets("Sheet1"):

Sub test()
Dim i As Integer

For i = 1 To 6 'Rows 1 to 6 hold your HTML formatted values

    With CreateObject("htmlfile")
        .Open
        .write Worksheets("Sheet1").Range("A" & i).Value
        .Close
        Worksheets("Sheet1").Range("B" & i).Value = .body.outerText
    End With

Next i
End Sub

Upvotes: 1

Related Questions