Reputation: 3
So here's my problem, I got a metric ton of html code in an Excel colunm. Something like that:
<p>Alerte suite à 1 vrt de EUR XXXXXX en provenance du YYYYY.</p>
<p> </p>
<p>Thing</p>
<p>bénéficiaire : Someone</p>
<p> </p>
<p>Flux inférieur à ZZZZZ EUR, ne nécessitant pas d'investigation.<br /> Décision du 5/7/2012</p>
So i wanted to treat it, in way that the info will stay in one cell. In my serch I saw this post Stack Question: Read html in Exel
But it was old, and the responce of @BornToCode from 2014, recommend to use a macro to automaticly copy/paste, however it turns the information into many cell.
On my side i use the following macro to "sorta" fix my issue
Sub suppHTML()
'Selection As Range For Each cell In Selection
cell.Select
Call supphtmlinCell
Next cell
End Sub
Sub supphtmlinCell()
Dim strPattern0 As String: strPattern0 = "</p>"
Dim strReplace0 As String: strReplace0 = vbNewLine
Dim regEx0 As New RegExp
Dim strInput0 As String
Dim strPattern As String: strPattern = "<.*?>"
Dim strReplace As String: strReplace = " "
Dim regEx As New RegExp
Dim strInput As String
Dim strPattern1 As String: strPattern1 = " "
Dim strReplace1 As String: strReplace1 = " "
Dim regEx1 As New RegExp
Dim strInput1 As String
Dim strPattern2 As String: strPattern2 = "é"
Dim strReplace2 As String: strReplace2 = "é"
Dim regEx2 As New RegExp
Dim strInput2 As String
Dim strPattern3 As String: strPattern3 = "è"
Dim strReplace3 As String: strReplace3 = "è"
Dim regEx3 As New RegExp
Dim strInput3 As String
Dim strPattern4 As String: strPattern4 = "ç"
Dim strReplace4 As String: strReplace4 = "ç"
Dim regEx4 As New RegExp
Dim strInput4 As String
Dim strPattern5 As String: strPattern5 = "ë"
Dim strReplace5 As String: strReplace5 = "ë"
Dim regEx5 As New RegExp
Dim strInput5 As String
Dim strPattern6 As String: strPattern6 = "à"
Dim strReplace6 As String: strReplace6 = "à"
Dim regEx6 As New RegExp
Dim strInput6 As String
Dim strPattern7 As String: strPattern7 = ">"
Dim strReplace7 As String: strReplace7 = ">"
Dim regEx7 As New RegExp
Dim strInput7 As String
Dim strPattern8 As String: strPattern8 = "<"
Dim strReplace8 As String: strReplace8 = "<"
Dim regEx8 As New RegExp
Dim strInput8 As String
Dim strPattern9 As String: strPattern9 = "&"
Dim strReplace9 As String: strReplace9 = "&"
Dim regEx9 As New RegExp
Dim strInput9 As String
If strPattern0 <> "" Then
strInput0 = ActiveCell.Offset(0, 0).Value
With regEx0
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern0
End With
If regEx0.Test(strInput0) Then
'MsgBox (regEx.Replace(strInput, strReplace))
ActiveCell.Offset(0, 0).Value = regEx0.Replace(strInput0, strReplace0)
Else
'MsgBox ("Not matched")
End If
End If
'ActiveCell.Offset(0, 0).Value = 2
If strPattern <> "" Then
strInput = ActiveCell.Offset(0, 0).Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
'MsgBox (regEx.Replace(strInput, strReplace))
ActiveCell.Offset(0, 0).Value = regEx.Replace(strInput, strReplace)
Else
'MsgBox ("Not matched")
End If
End If
If strPattern1 <> "" Then
strInput1 = ActiveCell.Offset(0, 0).Value
With regEx1
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern1
End With
If regEx1.Test(strInput1) Then
'MsgBox (regEx.Replace(strInput, strReplace))
ActiveCell.Offset(0, 0).Value = regEx1.Replace(strInput1, strReplace1)
Else
'MsgBox ("Not matched")
End If
End If
If strPattern2 <> "" Then
strInput2 = ActiveCell.Offset(0, 0).Value
With regEx2
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern2
End With
If regEx2.Test(strInput2) Then
'MsgBox (regEx.Replace(strInput, strReplace))
ActiveCell.Offset(0, 0).Value = regEx2.Replace(strInput2, strReplace2)
Else
'MsgBox ("Not matched")
End If
End If
If strPattern3 <> "" Then
strInput3 = ActiveCell.Offset(0, 0).Value
With regEx3
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern3
End With
If regEx3.Test(strInput3) Then
'MsgBox (regEx.Replace(strInput, strReplace))
ActiveCell.Offset(0, 0).Value = regEx3.Replace(strInput3, strReplace3)
Else
'MsgBox ("Not matched")
End If
End If
If strPattern4 <> "" Then
strInput4 = ActiveCell.Offset(0, 0).Value
With regEx4
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern4
End With
If regEx4.Test(strInput4) Then
'MsgBox (regEx.Replace(strInput, strReplace))
ActiveCell.Offset(0, 0).Value = regEx4.Replace(strInput4, strReplace4)
Else
'MsgBox ("Not matched")
End If
End If
If strPattern5 <> "" Then
strInput5 = ActiveCell.Offset(0, 0).Value
With regEx5
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern5
End With
If regEx5.Test(strInput5) Then
'MsgBox (regEx.Replace(strInput, strReplace))
ActiveCell.Offset(0, 0).Value = regEx5.Replace(strInput5, strReplace5)
Else
'MsgBox ("Not matched")
End If
End If
If strPattern6 <> "" Then
strInput6 = ActiveCell.Offset(0, 0).Value
With regEx6
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern6
End With
If regEx6.Test(strInput6) Then
'MsgBox (regEx.Replace(strInput, strReplace))
ActiveCell.Offset(0, 0).Value = regEx6.Replace(strInput6, strReplace6)
Else
'MsgBox ("Not matched")
End If
End If
If strPattern7 <> "" Then
strInput7 = ActiveCell.Offset(0, 0).Value
With regEx7
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern7
End With
If regEx7.Test(strInput7) Then
'MsgBox (regEx.Replace(strInput, strReplace))
ActiveCell.Offset(0, 0).Value = regEx7.Replace(strInput7, strReplace7)
Else
'MsgBox ("Not matched")
End If
End If
If strPattern8 <> "" Then
strInput8 = ActiveCell.Offset(0, 0).Value
With regEx8
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern8
End With
If regEx8.Test(strInput8) Then
'MsgBox (regEx.Replace(strInput, strReplace))
ActiveCell.Offset(0, 0).Value = regEx8.Replace(strInput8, strReplace8)
Else
'MsgBox ("Not matched")
End If
End If
If strPattern9 <> "" Then
strInput9 = ActiveCell.Offset(0, 0).Value
With regEx9
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern9
End With
If regEx9.Test(strInput9) Then
'MsgBox (regEx.Replace(strInput, strReplace))
ActiveCell.Offset(0, 0).Value = regEx9.Replace(strInput9, strReplace9)
Else
'MsgBox ("Not matched")
End If
End If
End Sub
So my solution is to find string i don't like and replace it by some other string that I prefer. But as you can all see, this solution is less than optimal. For instance now I'm still missing the way to interpret "à", "â" or "ô". And I'll need to edit the macro for every new string that I want to deal with.
So here I am looking for a better solution than mine, and I'm fairly confident a cleaner solution exist since Excel can interpret HTML through copy/paste.
Upvotes: 0
Views: 635
Reputation: 3
Here the ajustement of Rik Sportel code, in case someone arrive on that page with the same issue as mine:
Sub supphtml()
u = Selection.Address
For Each cel In Range(u)
With CreateObject("htmlfile")
.Open
.write cel.Value
.Close
cel.Value = .body.outerText
End With
Next cel
End Sub
This will transform all cells that were selected before running the macro
Upvotes: 0
Reputation: 2679
How about just using the HTML file class?
Let's say your sample strings are in Range A1:A6 on Worksheets("Sheet1")
:
Sub test()
Dim i As Integer
For i = 1 To 6 'Rows 1 to 6 hold your HTML formatted values
With CreateObject("htmlfile")
.Open
.write Worksheets("Sheet1").Range("A" & i).Value
.Close
Worksheets("Sheet1").Range("B" & i).Value = .body.outerText
End With
Next i
End Sub
Upvotes: 1