Reputation: 2355
Cells can contain a large amount of characters. I'm not sure about the limit but I am testing with 450+ characters. In VBA, I have no problem inserting the value of that cell in a string, reading it via debug.print, using functions on it such as Len(str) in order to find the character count.
My problem
The string I want to play with are HTML strings on which I apply the format and then remove HTML tags. The formats apply with no problem, using a macro I don't think is necessary to show (it's long) but when comes the time to remove HTML tags, I run into problems when the strings are higher than 255 characters.
Reproduce it yourself and see
Here is part of a piece of code to remove HTML tags regarding font color, adjusted to make the situation stand out. To use it, select a cell with HTML tags in it and run the code. BE CAREFUL - it will run an infinite loop when the length is greater than 255 characters, so step through with F8 and look at the debug.prints the first time. The deletions are simply skipped over without even any errors showing up.
Sub removeColorTags()
Dim i As Integer
Dim rng As Range
Dim str As String
Set rng = ActiveCell
i = InStr(rng.Value, "<font")
Do Until i = 0
Debug.Print Len(rng.Value)
str = rng.Value
Debug.Print str 'Displays correctly
rng.Characters(i, 20).Delete
i = InStr(rng.Value, "</font>")
rng.Characters(i, 7).Delete
i = InStr(rng.Value, "<font")
Loop
End Sub
Here is an example of what you can parse in a cell to try the code on to see it succeed without problem. It will remove the color tags but leave the size tags on. Make sure you get the whole line (250 characters)
<font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font>
Here is an example of what you can parse in a cell to try the code on to see it fail. Make sure you get the whole line (450 characters)
<font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font>
What I'd like
I'd like a way to remove the HTML tags on long strings. Doing it without reference to a cell (taking the value in a string, removing the tags with Replace or otherwise) doesn't work because when putting the value back, formatting is lost. The whole point of this is to format the cells.
Upvotes: 2
Views: 1186
Reputation: 2355
Here is what I ended up doing. First off, let's see a screenshot of input, transition information, and output. I'm starting from a normal formatted excel cell with text, then converting it to something similar to (but not quite like) HTML. This question was asking how I could remove substrings (the HTML tags) from the HTMl string (middle section of the screenshot) without losing formatting.
How this answers the question
I needed a way to remove substrings without losing the formatting on a cell that had more than 255 characters. This meant not using characters.insert
or characters.delete
, because as Tim Williams pointed out, they cause problems after 255 chars. So as a workaround I fragmented the input string between the substrings I wanted to remove, recorded the formatting they had, placed them back together, and reapplied the formats using characters(x,y).font
.
The sub I'm about to show scans the HTML string char by char and records it in a temporary string. When it encounters HTML tags, it stops recording the temporary string and records it in an array along with the formatting that was relevant for that temp string. It then reads the tags and changes the "current format" to what the HTML tags do, and starts recording again in a new temporary string. I will admit that the sub could be cut shorter by calling functions, but it works.
Sub FromHTML(rngToConvert As Range)
Dim i As Integer, j As Integer, k As Integer
Dim strHTML As String, strTemp As String
Dim rng As Range
Dim arr()
Dim lengthFormatted As Integer
Dim optBold As Boolean, optIta As Boolean, optUnd As Boolean, optCol As String, optSize As Integer
Dim inStrTemp As Boolean
Dim nbChars As Integer
Set rng = rngToConvert.Offset(0, 2)
rng.Clear
strHTML = rngToConvert.Value
If InStr(strHTML, "<") = 0 Then Exit Sub
ReDim arr(6, 0)
inStrTemp = False
strTemp = ""
optBold = False
optIta = False
optUnd = False
optCol = "0,0,0"
optSize = "11"
For i = 1 To Len(strHTML)
If Not Mid(strHTML, i, 1) = "<" And Not Mid(strHTML, i, 4) = "[LF]" Then
'All WANTED characters go here
strTemp = strTemp & Mid(strHTML, i, 1)
inStrTemp = True
If Len(strTemp) > 200 Or i = Len(strHTML) Then
'Cuts them shorter than 200 chars
'In retrospect this isn't necessary but doesn't interfere
ReDim Preserve arr(6, j)
arr(0, j) = strTemp
arr(1, j) = optBold
arr(2, j) = optIta
arr(3, j) = optUnd
arr(4, j) = optCol
arr(5, j) = optSize
arr(6, j) = Len(strTemp)
strTemp = ""
j = j + 1
End If
ElseIf Mid(strHTML, i, 4) = "[LF]" Then
'[LF] is what I used to indicate that there was a line change in the original text
ReDim Preserve arr(6, j)
arr(0, j) = strTemp
arr(1, j) = optBold
arr(2, j) = optIta
arr(3, j) = optUnd
arr(4, j) = optCol
arr(5, j) = optSize
arr(6, j) = Len(strTemp)
strTemp = ""
j = j + 1
strTemp = vbLf
inStrTemp = True
i = i + 3
ReDim Preserve arr(6, j)
arr(0, j) = strTemp
arr(1, j) = optBold
arr(2, j) = optIta
arr(3, j) = optUnd
arr(4, j) = optCol
arr(5, j) = optSize
arr(6, j) = Len(strTemp)
strTemp = ""
j = j + 1
Else
If inStrTemp = True Then
'Records the temporary string and the formats it used
ReDim Preserve arr(6, j)
arr(0, j) = strTemp
arr(1, j) = optBold
arr(2, j) = optIta
arr(3, j) = optUnd
arr(4, j) = optCol
arr(5, j) = optSize
arr(6, j) = Len(strTemp)
strTemp = ""
j = j + 1
inStrTemp = False
End If
'If we get here we hit a HTML tag, so we read it and skip to after it
If Mid(strHTML, i, 3) = "<b>" Then
optBold = True
i = i + 2
ElseIf Mid(strHTML, i, 4) = "</b>" Then
optBold = False
i = i + 3
ElseIf Mid(strHTML, i, 3) = "<i>" Then
optIta = True
i = i + 2
ElseIf Mid(strHTML, i, 4) = "</i>" Then
optIta = False
i = i + 3
ElseIf Mid(strHTML, i, 3) = "<u>" Then
optUnd = True
i = i + 2
ElseIf Mid(strHTML, i, 4) = "</u>" Then
optUnd = False
i = i + 3
ElseIf Mid(strHTML, i, 11) Like "<c=???????>" Then
'optCol = RED, GREEN, BLUE
optCol = CInt("&H" & Mid(strHTML, i + 4, 2)) & "," & _
CInt("&H" & Mid(strHTML, i + 6, 2)) & "," & _
CInt("&H" & Mid(strHTML, i + 8, 2))
i = i + 10
ElseIf Mid(strHTML, i, 6) Like "<s=??>" Then
optSize = CInt(Mid(strHTML, i + 3, 2))
i = i + 5
End If
End If
Next
'Filling the cell with unformatted text
For i = 0 To UBound(arr, 2)
'This debug.print shows the tempString that was recorded and the associated formats
Debug.Print arr(0, i) & " Bold=" & arr(1, i) & " Italic=" & arr(2, i) & " Underline=" & arr(3, i) & " RGB=" & arr(4, i) & " Size =" & arr(5, i)
rng.Value = rng.Value + arr(0, i)
Next
'Applying formats according to the arrays
nbChars = 1
For i = 0 To UBound(arr, 2)
If arr(0, i) = vbLf Then
nbChars = nbChars + 1
Else
rng.Characters(nbChars, arr(6, i)).Font.Bold = arr(1, i)
rng.Characters(nbChars, arr(6, i)).Font.Italic = arr(2, i)
rng.Characters(nbChars, arr(6, i)).Font.Underline = arr(3, i)
rng.Characters(nbChars, arr(6, i)).Font.Color = RGB(Split(arr(4, i), ",")(0), Split(arr(4, i), ",")(1), Split(arr(4, i), ",")(2))
rng.Characters(nbChars, arr(6, i)).Font.Size = CInt(arr(5, i))
nbChars = nbChars + arr(6, i)
End If
Next
End Sub
I feel like this sub is complex and the reason I wanted to answer with it is because it could help anyone trying to accomplish a similar goal. Of course, some adjustments will be needed. This is function I used to go from formatted text to HTML-like text. It's not part of the question but will help to understand the tags. It's based off a function I found online (though I cannot remember where). If you want to use both subs as-is, then be sure to remove the <html>
and </html>
tags at the beginning and end of the HTML string that this function puts.
Function fnConvert2HTML(myCell As Range) As String
Dim bldTagOn, itlTagOn, ulnTagOn, colTagOn, sizTagOn As Boolean
Dim i, chrCount As Integer
Dim chrCol, chrLastCol, chrSiz, chrLastSiz, htmlTxt As String
bldTagOn = False
itlTagOn = False
ulnTagOn = False
colTagOn = False
sizTagOn = False
chrCol = "NONE"
htmlTxt = "<html>"
chrCount = myCell.Characters.Count
For i = 1 To chrCount
With myCell.Characters(i, 1)
'If (.Font.Color) Then
chrCol = fnGetCol(.Font.Color)
If chrCol <> chrLastCol Then
htmlTxt = htmlTxt & "<c=#" & chrCol & ">"
chrLastCol = chrCol
End If
'End If
If (.Font.Size) Then
chrSiz = .Font.Size
If Len(chrSiz) = 1 Then chrSiz = "0" & chrSiz
If Not chrLastSiz = chrSiz Then
htmlTxt = htmlTxt & "<s=" & chrSiz & ">"
End If
chrLastSiz = chrSiz
End If
If .Font.Bold = True Then
If Not bldTagOn Then
htmlTxt = htmlTxt & "<b>"
bldTagOn = True
End If
Else
If bldTagOn Then
htmlTxt = htmlTxt & "</b>"
bldTagOn = False
End If
End If
If .Font.Italic = True Then
If Not itlTagOn Then
htmlTxt = htmlTxt & "<i>"
itlTagOn = True
End If
Else
If itlTagOn Then
htmlTxt = htmlTxt & "</i>"
itlTagOn = False
End If
End If
If .Font.Underline > 0 Then
If Not ulnTagOn Then
htmlTxt = htmlTxt & "<u>"
ulnTagOn = True
End If
Else
If ulnTagOn Then
htmlTxt = htmlTxt & "</u>"
ulnTagOn = False
End If
End If
If (Asc(.Text) = 10) Then
htmlTxt = htmlTxt & "[LF]"
Else
htmlTxt = htmlTxt & .Text
End If
End With
Next
If bldTagOn Then
htmlTxt = htmlTxt & "</b>"
bldTagOn = False
End If
If itlTagOn Then
htmlTxt = htmlTxt & "</i>"
itlTagOn = False
End If
If ulnTagOn Then
htmlTxt = htmlTxt & "</u>"
ulnTagOn = False
End If
htmlTxt = htmlTxt & "</html>"
fnConvert2HTML = htmlTxt
End Function
Upvotes: 1
Reputation: 3540
You may loop through all characters on the cell, check whether you`re not inside and html tag and add the text to another cell. The following code will do it (considering input from cell A1 and output to cell A2, both on activesheet):
Sub RemoveHtmlTags()
Dim charaux As String
Dim insideHTMLtag As Boolean
insideHTMLtag = False
For i = 1 To Len(ActiveSheet.Cells(1, 1).Value)
charaux = Mid(ActiveSheet.Cells(1, 1).Value, i, 1)
If (charaux = "<") Then
insideHTMLtag = True
ElseIf (charaux = ">") Then
insideHTMLtag = False
ElseIf (Not insideHTMLtag) Then
ActiveSheet.Cells(2, 1).Value = ActiveSheet.Cells(2, 1).Value + Mid(ActiveSheet.Cells(1, 1).Value, i, 1)
End If
Next i
End Sub
I`ve tested it on Excel 2016 for Mac and it works fine.
Upvotes: 0