Asif
Asif

Reputation: 151

Convert a URL formatted content to plain text in microsoft EXCEL

I have URL formatted content, usually I just translate manually 1-by-1m but this time there are thousand of entry, eg:-

%E5%B7%B2%E4%BB%8E%E5%B8%90%E6%88%B7zh*****%40outlook.com%E5%88%A0%E9%99%A48618650533*%E3%80%82%E4%B8%8D%E6%98%AF%E4%BD%A0%EF%BC%9F
%E7%AE%A1%E7%90%86%E9%A2%84%E8%AE%A2%0A
https%3A%2F%2Faccount.live.com%2Fa
OTO+GLOBAL+Certification+No%3A%5B6198%5D
Deluxe+Room+-1+%E9%97%B4%0A
Ihre+Agoda+Buchung+Nr.+77083713+ist+bes %C3%A4tigt%21+Verwalten+Sie+Ihre+B
%E6%82%A8%E7%9A%84Agoda%E8%AE%A2%E5%8D%95%2877083753%29%E5%B7%B2%E7%A%AE%E8%AE%A4%EF%BC%81+%E4%BD%BF%E7%94%A8%E6%88%91%E4%BB%AC%E7%9A%84%E5%85%8D%E8%B4%B9%E5%AE%A2%E6%88%B7%E7%AB%AFhttp%3A%2F%2Fapp-agoda.com%2FGetTheApp%EF%BC%8C%E8%BD%BB%E6%9D%BE

Is there any way to convert all of this content to plain English text in Microsoft Excel?

Regards

Upvotes: 3

Views: 17470

Answers (6)

Tony
Tony

Reputation: 109

I ran into a similar issue and the solution above which uses a custom User Defined function did not work for me as it only accounts for URLs that contain the first 256 characters on the ASCII Table which are represented by 1 byte.

However if a URL contains strange characters these would get encoded In UTF-8 which extends the ASCII character set and are represented in the URL by more than 1 byte. For example in a URL the following characters would be encoded as such:

'€' = '%E2%82%AC' and 'Á' = %C3%81

Below is a function that will also decode these strange characters as well, it reads the first 2 characters after finding a "%", (the header byte), to determines how many bytes the encoded character consists of, it then retrieves those additional encodings as needed and mathematically converts it to its Unicode code point representation, and converts it to the the correct character according to it's UTF-8 Encoding:

Function UTF_DecodeURL(ByVal URL As String) As String
    Dim i As Integer
    Dim charCode As String
    Dim DecodedURL As String
    Dim TempPath As String
    
    For i = 1 To Len(URL)
        'if it finds a percent, perform a decoce
        If Mid(URL, i, 1) = "%" Then
            'STEP 1, figure out how many bytes the encoding has we do this by converting the following hex to decimal
            '1 Byte (from 1-127), 2Bytes ( 194-223) , 3 Bytes (224-239), 4Bytes (240-247)
            ' refer to this site for documentation on calculation of unipoint code value
            'https://www.freecodecamp.org/news/what-is-utf-8-character-encoding/#:~:text=UTF%2D8%20extends%20the%20ASCII,as%20the%20non%2Dprintable%20characters.
            charCode = Mid(URL, i + 1, 2) 'get the next two characters after the % that is the headder hex string
            headerVal = Val("&H" & charCode) ' convert it to decimal
            
            If (headerVal >= 1 And headerVal <= 127) Then 'has 1Byte like '%26' "&"
                DecodedURL = DecodedURL & Chr(Val("&H" & charCode))
                i = i + 2
            ElseIf (headerVal >= 194 And headerVal <= 223) Then ' like %C3%81 Á
                Byte1DeciVal = Val("&H" & Mid(URL, i + 1, 2))
                Byte2DeciVal = Val("&H" & Mid(URL, i + 4, 2))
                UnicodePointVal = ((Byte1DeciVal - 194) * 64) + Byte2DeciVal
                DecodedURL = DecodedURL & Chr(UnicodePointVal)
                i = i + 5
            ElseIf (headerVal >= 224 And headerVal <= 239) Then ' like '%E2%82%AC' "€"
                Byte1DeciVal = Val("&H" & Mid(URL, i + 1, 2))
                Byte2DeciVal = Val("&H" & Mid(URL, i + 4, 2))
                Byte3DeciVal = Val("&H" & Mid(URL, i + 7, 2))
                UnicodePointVal = ((Byte1DeciVal - 224) * 4096) + ((Byte2DeciVal - 128) * 64) + (Byte3DeciVal - 128)
                DecodedURL = DecodedURL & ChrW(UnicodePointVal)
                i = i + 8
            ElseIf (headerVal >= 240 And headerVal <= 247) Then ' Like '%f0%90%8a%83' this range contains some pretty strange symbols excel does not correctly render characters with a Unipoint code large than 65535. These ones all appear like empty boxes even if you copy/paste the character in excel directly.
                Byte1DeciVal = Val("&H" & Mid(URL, i + 1, 2))
                Byte2DeciVal = Val("&H" & Mid(URL, i + 4, 2))
                Byte3DeciVal = Val("&H" & Mid(URL, i + 7, 2))
                Byte4DeciVal = Val("&H" & Mid(URL, i + 10, 2))
                UnicodePointVal = ((Byte1DeciVal - 240) * 262144) + ((Byte2DeciVal - 128) * 4096) + ((Byte3DeciVal - 128) * 64) + (Byte4DeciVal - 128)
                DecodedURL = DecodedURL & WorksheetFunction.Unichar(UnicodePointVal)
                i = i + 11
            End If
            
            
         'if the character at this index is not a % then add it to the output string
        Else
            DecodedURL = DecodedURL & Mid(URL, i, 1)
        End If
    Next i
    
    UTF_DecodeURL = DecodedURL

End Function

This article did a great job at explaining How UTF-8 Encoding Works, and How Much Storage Each Character Uses. It was very helpful in understanding how these characters are related to the percent encoding. https://www.freecodecamp.org/news/what-is-utf-8-character-encoding/#:~:text=UTF%2D8%20extends%20the%20ASCII,as%20the%20non%2Dprintable%20characters

Upvotes: 0

Warwick Allen
Warwick Allen

Reputation: 71

You can now use a recursive lambda function to create a succinct URL decoder function.

Open the Name Manager and define “DECODEURL” as:

=LAMBDA(encodedUrl,
  LET(
    length, LEN(encodedUrl),
    head, LEFT(encodedUrl, 1),
    IF(length < 3,
      encodedUrl,
      IF(head = "%",
        CONCAT(
          CHAR(HEX2DEC(MID(encodedUrl, 2, 2))),
          DECODEURL(RIGHT(encodedUrl, length - 3))
        ),
        CONCAT(
          head,
          DECODEURL(RIGHT(encodedUrl, length - 1))
        )
      )
    )
  )
)

Then, you can simply invoke the function with =DECODEURL(CELL_TO_BE_DECODED).

Notes
  1. There is no input validation, so this function will return #NUM! if there is an invalid string (such as %A%) in the URL.
  2. Ensure there are no tab characters in the function definition, as that will prevent Excel from recognising it as a valid formula.

Upvotes: 2

pitspoon
pitspoon

Reputation: 1

You can do it without VBA using array formulas. If A1 is the cell to be decoded, enter this formula, and then press Ctrl-Shift-Enter:

=TEXTJOIN("", FALSE,
    MID(A1,
        FIND("*",
            SUBSTITUTE("%DD"&A1,"%","*",  ROW(INDIRECT("1:"&LEN(A1)-LEN(SUBSTITUTE(A1,"%",""))+1))   )
        ),
        FIND("*",
            SUBSTITUTE(A1&"%","%","*",    ROW(INDIRECT("1:"&LEN(A1)-LEN(SUBSTITUTE(A1,"%",""))+1))   )
        ) - FIND("*",
            SUBSTITUTE("%EE"&A1,"%","*",  ROW(INDIRECT("1:"&LEN(A1)-LEN(SUBSTITUTE(A1,"%",""))+1))   )
        )
    ) & IFERROR(CHAR(HEX2DEC(MID(A1,
        FIND("*",
            SUBSTITUTE(A1&"%","%","*",    ROW(INDIRECT("1:"&LEN(A1)-LEN(SUBSTITUTE(A1,"%",""))+1))   )
        )+1,
       2
    ))),"")
 )

It won't work if you don't Ctrl-Shift-Enter.

Upvotes: 0

PowerTools
PowerTools

Reputation: 29

The following adds decoding for commas in URLs. Just an additional SUBSTITUTE for %2C. This just adds to Carlos's post from 4 years ago.

=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(CELL_TO_BE_DECODED,"%3F","?"),"%20"," "),"%25", "%"),"%26","&"),"%3D","="),"%7B","{"),"%7D","}"),"%5B","["),"%5D","]"),"%2D","-"),"%2C",",")

Upvotes: 2

Carlos Miranda
Carlos Miranda

Reputation: 381

There isn't a built-in function to handle this, but it's possible with a custom function, installing a third-party add-in, or using the substitute command:

Using a custom VBA function

Source: http://www.freevbcode.com/ShowCode.asp?ID=1512

Public Function URLDecode(StringToDecode As String) As String

Dim TempAns As String
Dim CurChr As Integer

CurChr = 1

Do Until CurChr - 1 = Len(StringToDecode)
  Select Case Mid(StringToDecode, CurChr, 1)
    Case "+"
      TempAns = TempAns & " "
    Case "%"
      TempAns = TempAns & Chr(Val("&h" & _
         Mid(StringToDecode, CurChr + 1, 2)))
       CurChr = CurChr + 2
    Case Else
      TempAns = TempAns & Mid(StringToDecode, CurChr, 1)
  End Select

CurChr = CurChr + 1
Loop

URLDecode = TempAns
End Function

With third-party add-in

Source: SeoTools (needs installation)

=UrlDecode(your_string_here)

With substitute command

Source: https://searchmarketingcorner.wordpress.com/2013/03/27/creating-an-excel-formula-to-encode-or-unencode-urls/

Paste the formula below to the right of your cell in order to URL decode the contents of that cell

=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(CELL_TO_BE_DECODED,"%3F","?"),"%20"," "),"%25", "%"),"%26","&"),"%3D","="),"%7B","{"),"%7D","}"),"%5B","["),"%5D","]")

Or for working with GUIDs, add one more SUBSTITUTE for the dashes.

=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(CELL_TO_BE_DECODED,"%3F","?"),"%20"," "),"%25", "%"),"%26","&"),"%3D","="),"%7B","{"),"%7D","}"),"%5B","["),"%5D","]"),"%2D","-")

For completeness, here is the reverse formula for URL encode. This is the same as the URL encode formula but positions of new_text and old_text swapped around.

=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(CELL_TO_BE_ENCODED,"?","%3F")," ","%20"),"%","%25"),"&","%26"),"=","%3D"),"{","%7B"),"}","%7D"),"[","%5B"),"]","%5D")

Or for working with GUIDs, add one more SUBSTITUTE for the dashes.

=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(CELL_TO_BE_ENCODED,"?","%3F")," ","%20"),"%","%25"),"&","%26"),"=","%3D"),"{","%7B"),"}","%7D"),"[","%5B"),"]","%5D"),"-","%2D")

Upvotes: 11

Excel Hero
Excel Hero

Reputation: 14764

Here is a User Defined Function (UDF) that actually works.

In a standard code module, place this routine:

Public Function URLDecode(url$) As String
    With CreateObject("ScriptControl")
        .Language = "JavaScript"
        URLDecode = .Eval("unescape(""" & url & """)")
    End With
End Function

Now you can call it from the worksheet, just like a built-in Excel function.

For example, if your encoded URL text were in cell A1, you could enter the following formula in cell B1:

=URLDecode(A1)

That's it. The fully decoded URL is now in cell B1.

Note that this is the real deal. It is not an attempt to replace a couple of characters. This uses the full power of JavaScript by way of the Microsoft Script Control to completely decode the URL.

Upvotes: 4

Related Questions