David G
David G

Reputation: 2355

Dealing with character limits on Strings

Cells can contain a large amount of characters. I'm not sure about the limit but I am testing with 450+ characters. In VBA, I have no problem inserting the value of that cell in a string, reading it via debug.print, using functions on it such as Len(str) in order to find the character count.

My problem

The string I want to play with are HTML strings on which I apply the format and then remove HTML tags. The formats apply with no problem, using a macro I don't think is necessary to show (it's long) but when comes the time to remove HTML tags, I run into problems when the strings are higher than 255 characters.

Reproduce it yourself and see

Here is part of a piece of code to remove HTML tags regarding font color, adjusted to make the situation stand out. To use it, select a cell with HTML tags in it and run the code. BE CAREFUL - it will run an infinite loop when the length is greater than 255 characters, so step through with F8 and look at the debug.prints the first time. The deletions are simply skipped over without even any errors showing up.

Sub removeColorTags()
    Dim i As Integer
    Dim rng As Range
    Dim str As String
    Set rng = ActiveCell
        i = InStr(rng.Value, "<font")
    Do Until i = 0
    Debug.Print Len(rng.Value)
    str = rng.Value
    Debug.Print str 'Displays correctly
        rng.Characters(i, 20).Delete
        i = InStr(rng.Value, "</font>")
        rng.Characters(i, 7).Delete
        i = InStr(rng.Value, "<font")
    Loop
End Sub

Here is an example of what you can parse in a cell to try the code on to see it succeed without problem. It will remove the color tags but leave the size tags on. Make sure you get the whole line (250 characters)

<font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font>

Here is an example of what you can parse in a cell to try the code on to see it fail. Make sure you get the whole line (450 characters)

<font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font>

What I'd like

I'd like a way to remove the HTML tags on long strings. Doing it without reference to a cell (taking the value in a string, removing the tags with Replace or otherwise) doesn't work because when putting the value back, formatting is lost. The whole point of this is to format the cells.

Upvotes: 2

Views: 1186

Answers (2)

David G
David G

Reputation: 2355

Here is what I ended up doing. First off, let's see a screenshot of input, transition information, and output. I'm starting from a normal formatted excel cell with text, then converting it to something similar to (but not quite like) HTML. This question was asking how I could remove substrings (the HTML tags) from the HTMl string (middle section of the screenshot) without losing formatting.

enter image description here

How this answers the question

I needed a way to remove substrings without losing the formatting on a cell that had more than 255 characters. This meant not using characters.insert or characters.delete, because as Tim Williams pointed out, they cause problems after 255 chars. So as a workaround I fragmented the input string between the substrings I wanted to remove, recorded the formatting they had, placed them back together, and reapplied the formats using characters(x,y).font .

The sub I'm about to show scans the HTML string char by char and records it in a temporary string. When it encounters HTML tags, it stops recording the temporary string and records it in an array along with the formatting that was relevant for that temp string. It then reads the tags and changes the "current format" to what the HTML tags do, and starts recording again in a new temporary string. I will admit that the sub could be cut shorter by calling functions, but it works.

Sub FromHTML(rngToConvert As Range)
    Dim i As Integer, j As Integer, k As Integer
    Dim strHTML As String, strTemp As String
    Dim rng As Range
    Dim arr()
    Dim lengthFormatted As Integer
    Dim optBold As Boolean, optIta As Boolean, optUnd As Boolean, optCol As String, optSize As Integer
    Dim inStrTemp As Boolean
    Dim nbChars As Integer

    Set rng = rngToConvert.Offset(0, 2)
    rng.Clear
    strHTML = rngToConvert.Value

    If InStr(strHTML, "<") = 0 Then Exit Sub


    ReDim arr(6, 0)

    inStrTemp = False
    strTemp = ""
    optBold = False
    optIta = False
    optUnd = False
    optCol = "0,0,0"
    optSize = "11"

    For i = 1 To Len(strHTML)

        If Not Mid(strHTML, i, 1) = "<" And Not Mid(strHTML, i, 4) = "[LF]" Then
            'All WANTED characters go here
            strTemp = strTemp & Mid(strHTML, i, 1)
            inStrTemp = True

            If Len(strTemp) > 200 Or i = Len(strHTML) Then
                'Cuts them shorter than 200 chars
                'In retrospect this isn't necessary but doesn't interfere
                ReDim Preserve arr(6, j)
                arr(0, j) = strTemp
                arr(1, j) = optBold
                arr(2, j) = optIta
                arr(3, j) = optUnd
                arr(4, j) = optCol
                arr(5, j) = optSize
                arr(6, j) = Len(strTemp)
                strTemp = ""
                j = j + 1
            End If
        ElseIf Mid(strHTML, i, 4) = "[LF]" Then
            '[LF] is what I used to indicate that there was a line change in the original text
                ReDim Preserve arr(6, j)
                arr(0, j) = strTemp
                arr(1, j) = optBold
                arr(2, j) = optIta
                arr(3, j) = optUnd
                arr(4, j) = optCol
                arr(5, j) = optSize
                arr(6, j) = Len(strTemp)
                strTemp = ""
                j = j + 1

            strTemp = vbLf
            inStrTemp = True
            i = i + 3

                ReDim Preserve arr(6, j)
                arr(0, j) = strTemp
                arr(1, j) = optBold
                arr(2, j) = optIta
                arr(3, j) = optUnd
                arr(4, j) = optCol
                arr(5, j) = optSize
                arr(6, j) = Len(strTemp)
                strTemp = ""
                j = j + 1
        Else
             If inStrTemp = True Then
                'Records the temporary string and the formats it used
                ReDim Preserve arr(6, j)
                arr(0, j) = strTemp
                arr(1, j) = optBold
                arr(2, j) = optIta
                arr(3, j) = optUnd
                arr(4, j) = optCol
                arr(5, j) = optSize
                arr(6, j) = Len(strTemp)
                strTemp = ""
                j = j + 1
                inStrTemp = False
            End If

            'If we get here we hit a HTML tag, so we read it and skip to after it
            If Mid(strHTML, i, 3) = "<b>" Then
                optBold = True
                i = i + 2 
            ElseIf Mid(strHTML, i, 4) = "</b>" Then
                optBold = False
                i = i + 3
            ElseIf Mid(strHTML, i, 3) = "<i>" Then
                optIta = True
                i = i + 2
            ElseIf Mid(strHTML, i, 4) = "</i>" Then
                optIta = False
                i = i + 3
            ElseIf Mid(strHTML, i, 3) = "<u>" Then
                optUnd = True
                i = i + 2
            ElseIf Mid(strHTML, i, 4) = "</u>" Then
                optUnd = False
                i = i + 3
            ElseIf Mid(strHTML, i, 11) Like "<c=???????>" Then
                       'optCol = RED, GREEN, BLUE
                optCol = CInt("&H" & Mid(strHTML, i + 4, 2)) & "," & _
                        CInt("&H" & Mid(strHTML, i + 6, 2)) & "," & _
                        CInt("&H" & Mid(strHTML, i + 8, 2))
                i = i + 10
            ElseIf Mid(strHTML, i, 6) Like "<s=??>" Then
                optSize = CInt(Mid(strHTML, i + 3, 2))
                i = i + 5
            End If
        End If
    Next

    'Filling the cell with unformatted text
    For i = 0 To UBound(arr, 2)
      'This debug.print shows the tempString that was recorded and the associated formats
        Debug.Print arr(0, i) & " Bold=" & arr(1, i) & " Italic=" & arr(2, i) & " Underline=" & arr(3, i) & " RGB=" & arr(4, i) & " Size =" & arr(5, i)
        rng.Value = rng.Value + arr(0, i)
    Next
    'Applying formats according to the arrays
    nbChars = 1
    For i = 0 To UBound(arr, 2)
        If arr(0, i) = vbLf Then
            nbChars = nbChars + 1
        Else
            rng.Characters(nbChars, arr(6, i)).Font.Bold = arr(1, i)
            rng.Characters(nbChars, arr(6, i)).Font.Italic = arr(2, i)
            rng.Characters(nbChars, arr(6, i)).Font.Underline = arr(3, i)
            rng.Characters(nbChars, arr(6, i)).Font.Color = RGB(Split(arr(4, i), ",")(0), Split(arr(4, i), ",")(1), Split(arr(4, i), ",")(2))
            rng.Characters(nbChars, arr(6, i)).Font.Size = CInt(arr(5, i))
            nbChars = nbChars + arr(6, i)
        End If
    Next
End Sub

I feel like this sub is complex and the reason I wanted to answer with it is because it could help anyone trying to accomplish a similar goal. Of course, some adjustments will be needed. This is function I used to go from formatted text to HTML-like text. It's not part of the question but will help to understand the tags. It's based off a function I found online (though I cannot remember where). If you want to use both subs as-is, then be sure to remove the <html> and </html> tags at the beginning and end of the HTML string that this function puts.

Function fnConvert2HTML(myCell As Range) As String
    Dim bldTagOn, itlTagOn, ulnTagOn, colTagOn, sizTagOn As Boolean
    Dim i, chrCount As Integer
    Dim chrCol, chrLastCol, chrSiz, chrLastSiz, htmlTxt As String

    bldTagOn = False
    itlTagOn = False
    ulnTagOn = False
    colTagOn = False
    sizTagOn = False
    chrCol = "NONE"
    htmlTxt = "<html>"
    chrCount = myCell.Characters.Count

    For i = 1 To chrCount
        With myCell.Characters(i, 1)
            'If (.Font.Color) Then
                chrCol = fnGetCol(.Font.Color)
                If chrCol <> chrLastCol Then
                    htmlTxt = htmlTxt & "<c=#" & chrCol & ">"
                    chrLastCol = chrCol
                End If
            'End If

            If (.Font.Size) Then
                chrSiz = .Font.Size
                If Len(chrSiz) = 1 Then chrSiz = "0" & chrSiz
                If Not chrLastSiz = chrSiz Then
                    htmlTxt = htmlTxt & "<s=" & chrSiz & ">"
                End If
                chrLastSiz = chrSiz
            End If

            If .Font.Bold = True Then
                If Not bldTagOn Then
                    htmlTxt = htmlTxt & "<b>"
                    bldTagOn = True
                End If
            Else
                If bldTagOn Then
                    htmlTxt = htmlTxt & "</b>"
                    bldTagOn = False
                End If
            End If

            If .Font.Italic = True Then
                If Not itlTagOn Then
                    htmlTxt = htmlTxt & "<i>"
                    itlTagOn = True
                End If
            Else
                If itlTagOn Then
                    htmlTxt = htmlTxt & "</i>"
                    itlTagOn = False
                End If
            End If

            If .Font.Underline > 0 Then
                If Not ulnTagOn Then
                    htmlTxt = htmlTxt & "<u>"
                    ulnTagOn = True
                End If
            Else
                If ulnTagOn Then
                    htmlTxt = htmlTxt & "</u>"
                    ulnTagOn = False
                End If
            End If

            If (Asc(.Text) = 10) Then
                htmlTxt = htmlTxt & "[LF]"
            Else
                htmlTxt = htmlTxt & .Text
            End If
        End With
    Next

    If bldTagOn Then
        htmlTxt = htmlTxt & "</b>"
        bldTagOn = False
    End If
    If itlTagOn Then
        htmlTxt = htmlTxt & "</i>"
        itlTagOn = False
    End If
    If ulnTagOn Then
        htmlTxt = htmlTxt & "</u>"
        ulnTagOn = False
    End If
    htmlTxt = htmlTxt & "</html>"
    fnConvert2HTML = htmlTxt
End Function

Upvotes: 1

GCSDC
GCSDC

Reputation: 3540

You may loop through all characters on the cell, check whether you`re not inside and html tag and add the text to another cell. The following code will do it (considering input from cell A1 and output to cell A2, both on activesheet):

 Sub RemoveHtmlTags()
    Dim charaux As String
    Dim insideHTMLtag As Boolean

    insideHTMLtag = False

    For i = 1 To Len(ActiveSheet.Cells(1, 1).Value)

        charaux = Mid(ActiveSheet.Cells(1, 1).Value, i, 1)

        If (charaux = "<") Then
            insideHTMLtag = True
        ElseIf (charaux = ">") Then
            insideHTMLtag = False
        ElseIf (Not insideHTMLtag) Then
            ActiveSheet.Cells(2, 1).Value = ActiveSheet.Cells(2, 1).Value + Mid(ActiveSheet.Cells(1, 1).Value, i, 1)
        End If

    Next i
End Sub

I`ve tested it on Excel 2016 for Mac and it works fine.

Upvotes: 0

Related Questions