Reputation: 151
I have URL formatted content, usually I just translate manually 1-by-1m but this time there are thousand of entry, eg:-
%E5%B7%B2%E4%BB%8E%E5%B8%90%E6%88%B7zh*****%40outlook.com%E5%88%A0%E9%99%A48618650533*%E3%80%82%E4%B8%8D%E6%98%AF%E4%BD%A0%EF%BC%9F
%E7%AE%A1%E7%90%86%E9%A2%84%E8%AE%A2%0A
https%3A%2F%2Faccount.live.com%2Fa
OTO+GLOBAL+Certification+No%3A%5B6198%5D
Deluxe+Room+-1+%E9%97%B4%0A
Ihre+Agoda+Buchung+Nr.+77083713+ist+bes %C3%A4tigt%21+Verwalten+Sie+Ihre+B
%E6%82%A8%E7%9A%84Agoda%E8%AE%A2%E5%8D%95%2877083753%29%E5%B7%B2%E7%A%AE%E8%AE%A4%EF%BC%81+%E4%BD%BF%E7%94%A8%E6%88%91%E4%BB%AC%E7%9A%84%E5%85%8D%E8%B4%B9%E5%AE%A2%E6%88%B7%E7%AB%AFhttp%3A%2F%2Fapp-agoda.com%2FGetTheApp%EF%BC%8C%E8%BD%BB%E6%9D%BE
Is there any way to convert all of this content to plain English text in Microsoft Excel?
Regards
Upvotes: 3
Views: 17470
Reputation: 109
I ran into a similar issue and the solution above which uses a custom User Defined function did not work for me as it only accounts for URLs that contain the first 256 characters on the ASCII Table which are represented by 1 byte.
However if a URL contains strange characters these would get encoded In UTF-8 which extends the ASCII character set and are represented in the URL by more than 1 byte. For example in a URL the following characters would be encoded as such:
'€' = '%E2%82%AC' and 'Á' = %C3%81
Below is a function that will also decode these strange characters as well, it reads the first 2 characters after finding a "%", (the header byte), to determines how many bytes the encoded character consists of, it then retrieves those additional encodings as needed and mathematically converts it to its Unicode code point representation, and converts it to the the correct character according to it's UTF-8 Encoding:
Function UTF_DecodeURL(ByVal URL As String) As String
Dim i As Integer
Dim charCode As String
Dim DecodedURL As String
Dim TempPath As String
For i = 1 To Len(URL)
'if it finds a percent, perform a decoce
If Mid(URL, i, 1) = "%" Then
'STEP 1, figure out how many bytes the encoding has we do this by converting the following hex to decimal
'1 Byte (from 1-127), 2Bytes ( 194-223) , 3 Bytes (224-239), 4Bytes (240-247)
' refer to this site for documentation on calculation of unipoint code value
'https://www.freecodecamp.org/news/what-is-utf-8-character-encoding/#:~:text=UTF%2D8%20extends%20the%20ASCII,as%20the%20non%2Dprintable%20characters.
charCode = Mid(URL, i + 1, 2) 'get the next two characters after the % that is the headder hex string
headerVal = Val("&H" & charCode) ' convert it to decimal
If (headerVal >= 1 And headerVal <= 127) Then 'has 1Byte like '%26' "&"
DecodedURL = DecodedURL & Chr(Val("&H" & charCode))
i = i + 2
ElseIf (headerVal >= 194 And headerVal <= 223) Then ' like %C3%81 Á
Byte1DeciVal = Val("&H" & Mid(URL, i + 1, 2))
Byte2DeciVal = Val("&H" & Mid(URL, i + 4, 2))
UnicodePointVal = ((Byte1DeciVal - 194) * 64) + Byte2DeciVal
DecodedURL = DecodedURL & Chr(UnicodePointVal)
i = i + 5
ElseIf (headerVal >= 224 And headerVal <= 239) Then ' like '%E2%82%AC' "€"
Byte1DeciVal = Val("&H" & Mid(URL, i + 1, 2))
Byte2DeciVal = Val("&H" & Mid(URL, i + 4, 2))
Byte3DeciVal = Val("&H" & Mid(URL, i + 7, 2))
UnicodePointVal = ((Byte1DeciVal - 224) * 4096) + ((Byte2DeciVal - 128) * 64) + (Byte3DeciVal - 128)
DecodedURL = DecodedURL & ChrW(UnicodePointVal)
i = i + 8
ElseIf (headerVal >= 240 And headerVal <= 247) Then ' Like '%f0%90%8a%83' this range contains some pretty strange symbols excel does not correctly render characters with a Unipoint code large than 65535. These ones all appear like empty boxes even if you copy/paste the character in excel directly.
Byte1DeciVal = Val("&H" & Mid(URL, i + 1, 2))
Byte2DeciVal = Val("&H" & Mid(URL, i + 4, 2))
Byte3DeciVal = Val("&H" & Mid(URL, i + 7, 2))
Byte4DeciVal = Val("&H" & Mid(URL, i + 10, 2))
UnicodePointVal = ((Byte1DeciVal - 240) * 262144) + ((Byte2DeciVal - 128) * 4096) + ((Byte3DeciVal - 128) * 64) + (Byte4DeciVal - 128)
DecodedURL = DecodedURL & WorksheetFunction.Unichar(UnicodePointVal)
i = i + 11
End If
'if the character at this index is not a % then add it to the output string
Else
DecodedURL = DecodedURL & Mid(URL, i, 1)
End If
Next i
UTF_DecodeURL = DecodedURL
End Function
This article did a great job at explaining How UTF-8 Encoding Works, and How Much Storage Each Character Uses. It was very helpful in understanding how these characters are related to the percent encoding. https://www.freecodecamp.org/news/what-is-utf-8-character-encoding/#:~:text=UTF%2D8%20extends%20the%20ASCII,as%20the%20non%2Dprintable%20characters
Upvotes: 0
Reputation: 71
You can now use a recursive lambda function to create a succinct URL decoder function.
Open the Name Manager and define “DECODEURL” as:
=LAMBDA(encodedUrl,
LET(
length, LEN(encodedUrl),
head, LEFT(encodedUrl, 1),
IF(length < 3,
encodedUrl,
IF(head = "%",
CONCAT(
CHAR(HEX2DEC(MID(encodedUrl, 2, 2))),
DECODEURL(RIGHT(encodedUrl, length - 3))
),
CONCAT(
head,
DECODEURL(RIGHT(encodedUrl, length - 1))
)
)
)
)
)
Then, you can simply invoke the function with =DECODEURL(CELL_TO_BE_DECODED)
.
#NUM!
if there is an invalid string (such as %A%
) in the URL.Upvotes: 2
Reputation: 1
You can do it without VBA using array formulas. If A1 is the cell to be decoded, enter this formula, and then press Ctrl-Shift-Enter:
=TEXTJOIN("", FALSE,
MID(A1,
FIND("*",
SUBSTITUTE("%DD"&A1,"%","*", ROW(INDIRECT("1:"&LEN(A1)-LEN(SUBSTITUTE(A1,"%",""))+1)) )
),
FIND("*",
SUBSTITUTE(A1&"%","%","*", ROW(INDIRECT("1:"&LEN(A1)-LEN(SUBSTITUTE(A1,"%",""))+1)) )
) - FIND("*",
SUBSTITUTE("%EE"&A1,"%","*", ROW(INDIRECT("1:"&LEN(A1)-LEN(SUBSTITUTE(A1,"%",""))+1)) )
)
) & IFERROR(CHAR(HEX2DEC(MID(A1,
FIND("*",
SUBSTITUTE(A1&"%","%","*", ROW(INDIRECT("1:"&LEN(A1)-LEN(SUBSTITUTE(A1,"%",""))+1)) )
)+1,
2
))),"")
)
It won't work if you don't Ctrl-Shift-Enter.
Upvotes: 0
Reputation: 29
The following adds decoding for commas in URLs. Just an additional SUBSTITUTE
for %2C
. This just adds to Carlos's post from 4 years ago.
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(CELL_TO_BE_DECODED,"%3F","?"),"%20"," "),"%25", "%"),"%26","&"),"%3D","="),"%7B","{"),"%7D","}"),"%5B","["),"%5D","]"),"%2D","-"),"%2C",",")
Upvotes: 2
Reputation: 381
There isn't a built-in function to handle this, but it's possible with a custom function, installing a third-party add-in, or using the substitute command:
Source: http://www.freevbcode.com/ShowCode.asp?ID=1512
Public Function URLDecode(StringToDecode As String) As String
Dim TempAns As String
Dim CurChr As Integer
CurChr = 1
Do Until CurChr - 1 = Len(StringToDecode)
Select Case Mid(StringToDecode, CurChr, 1)
Case "+"
TempAns = TempAns & " "
Case "%"
TempAns = TempAns & Chr(Val("&h" & _
Mid(StringToDecode, CurChr + 1, 2)))
CurChr = CurChr + 2
Case Else
TempAns = TempAns & Mid(StringToDecode, CurChr, 1)
End Select
CurChr = CurChr + 1
Loop
URLDecode = TempAns
End Function
Source: SeoTools (needs installation)
=UrlDecode(your_string_here)
Paste the formula below to the right of your cell in order to URL decode the contents of that cell
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(CELL_TO_BE_DECODED,"%3F","?"),"%20"," "),"%25", "%"),"%26","&"),"%3D","="),"%7B","{"),"%7D","}"),"%5B","["),"%5D","]")
Or for working with GUIDs, add one more SUBSTITUTE for the dashes.
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(CELL_TO_BE_DECODED,"%3F","?"),"%20"," "),"%25", "%"),"%26","&"),"%3D","="),"%7B","{"),"%7D","}"),"%5B","["),"%5D","]"),"%2D","-")
For completeness, here is the reverse formula for URL encode. This is the same as the URL encode formula but positions of new_text
and old_text
swapped around.
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(CELL_TO_BE_ENCODED,"?","%3F")," ","%20"),"%","%25"),"&","%26"),"=","%3D"),"{","%7B"),"}","%7D"),"[","%5B"),"]","%5D")
Or for working with GUIDs, add one more SUBSTITUTE for the dashes.
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(CELL_TO_BE_ENCODED,"?","%3F")," ","%20"),"%","%25"),"&","%26"),"=","%3D"),"{","%7B"),"}","%7D"),"[","%5B"),"]","%5D"),"-","%2D")
Upvotes: 11
Reputation: 14764
Here is a User Defined Function (UDF) that actually works.
In a standard code module, place this routine:
Public Function URLDecode(url$) As String
With CreateObject("ScriptControl")
.Language = "JavaScript"
URLDecode = .Eval("unescape(""" & url & """)")
End With
End Function
Now you can call it from the worksheet, just like a built-in Excel function.
For example, if your encoded URL text were in cell A1, you could enter the following formula in cell B1:
=URLDecode(A1)
That's it. The fully decoded URL is now in cell B1.
Note that this is the real deal. It is not an attempt to replace a couple of characters. This uses the full power of JavaScript by way of the Microsoft Script Control to completely decode the URL.
Upvotes: 4