Reputation: 3106
I needed to convert the following function to python to deobfuscate a text extracted while web scraping:
function obfuscateText(coded, key) {
// Email obfuscator script 2.1 by Tim Williams, University of Arizona
// Random encryption key feature by Andrew Moulden, Site Engineering Ltd
// This code is freeware provided these four comment lines remain intact
// A wizard to generate this code is at http://www.jottings.com/obfuscator/
shift = coded.length
link = ""
for (i = 0; i < coded.length; i++) {
if (key.indexOf(coded.charAt(i)) == -1) {
ltr = coded.charAt(i)
link += (ltr)
}
else {
ltr = (key.indexOf(coded.charAt(i)) - shift + key.length) % key.length
link += (key.charAt(ltr))
}
}
document.write("<a href='mailto:" + link + "'>" + link + "</a>")
}"""
here is my converted python equivalent:
def obfuscateText(coded,key):
shift = len(coded)
link = ""
for i in range(0,len(coded)):
inkey=key.index(coded[i]) if coded[i] in key else None
if ( not inkey):
ltr = coded[i]
link += ltr
else:
ltr = (key.index(coded[i]) - shift + len(key)) % len(key)
link += key[ltr]
return link
print obfuscateText("uw#287u##Guw#287Xw8Iwu!#W7L#", "WXYVZabUcdTefgShiRjklQmnoPpqOrstNuvMwxyLz01K23J456I789H.@G!#$F%&E'*+D-/=C?^B_`{A|}~")
actionattraction$comcastWnet
but I am getting a slightly incorrect output instead of [email protected] I get above. Also many a times the above code gives random characters for the same html page,
The target html page has a obfuscateText function in JS with the coded and key, I extract the function signature in obsfunc and execute it on the fly:
email=eval(obsfunc)
which stores the email in above variable, but the problem is that it works most of the time but fails certain times , I strongly feel that the problem is with the arguments supplied to the python function , they may need escaping or conversion as it contains special characters? I tried passing raw arguments and different castings like repr() but the problem persisted.
Some examples for [email protected] wrongly computed and correctly computed using the same python function(first line is email):
@ation@ttr@ationVaoma@st!nct
obfuscateText("KMd%Y@Kdd8KMd%Y@IMY!MKcdJ@*d", "utvsrwqxpyonzm0l1k2ji3h4g5fe6d7c8b9aZ.Y@X!WV#U$T%S&RQ'P*O+NM-L/K=J?IH^G_F`ED{C|B}A~")
}ction}ttr}ction@comc}st.net
obfuscateText("}ARGML}RRP}ARGMLjAMKA}QRiLCR", "}|{`_^?=/-+*'&%$#!@.9876543210zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA~")
[email protected]
obfuscateText("DEWLRQDWWUDEWLRQoERPEDVWnQHW", "%&$#!@.'9876*54321+0zyxw-vutsr/qponm=lkjih?gfed^cbaZY_XWVUT`SRQPO{NMLKJ|IHGFE}DCBA~")
Upvotes: 0
Views: 2304
Reputation: 56694
I've rewritten the deobfuscator:
def deobfuscate_text(coded, key):
offset = (len(key) - len(coded)) % len(key)
shifted_key = key[offset:] + key[:offset]
lookup = dict(zip(key, shifted_key))
return "".join(lookup.get(ch, ch) for ch in coded)
and tested it as
tests = [
("KMd%Y@Kdd8KMd%Y@IMY!MKcdJ@*d", "utvsrwqxpyonzm0l1k2ji3h4g5fe6d7c8b9aZ.Y@X!WV#U$T%S&RQ'P*O+NM-L/K=J?IH^G_F`ED{C|B}A~"),
("}ARGML}RRP}ARGMLjAMKA}QRiLCR", "}|{`_^?=/-+*'&%$#!@.9876543210zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA~"),
("DEWLRQDWWUDEWLRQoERPEDVWnQHW", "%&$#!@.'9876*54321+0zyxw-vutsr/qponm=lkjih?gfed^cbaZY_XWVUT`SRQPO{NMLKJ|IHGFE}DCBA~"),
("[email protected]", "ksYSozqUyFOx9uKvQa2P4lEBhMRGC8g6jZXiDwV5eJcAp7rIHL31bnTWmN0dft")
]
for coded,key in tests:
print(deobfuscate_text(coded, key))
which gives
[email protected]
[email protected]
[email protected]
[email protected]
Note that all three key strings contain &
; replacing it with &
fixes the problem. Presumably at some point the javascript was mistakenly html-code-escaped; Python has a module which will unencode html special characters like so:
# Python 2.x:
import HTMLParser
html_parser = HTMLParser.HTMLParser()
unescaped = html_parser.unescape(my_string)
# Python 3.x:
import html.parser
html_parser = html.parser.HTMLParser()
unescaped = html_parser.unescape(my_string)
Upvotes: 1
Reputation: 7923
First of all, index
doesn't return None
, but throws an exception. In your case, W appears instead of a dot because the index returned is 0
, and not inkey
(which is also wrong) mistakenly beleive that a character is not present in the key.
Second, presence of &
suggests that you indeed may have to find and decode HTML entities.
Finally, I'd recommend to rewrite it like
len0 = len(code)
len1 = len(key)
link = ''
for ch in code:
try:
ch = key[(key.index(ch) - len0 + len1) % len1]
except ValueError: pass
link += ch
return link
Upvotes: 1