Reputation: 177
I'm scrapping a website script and I came across a variable
var string = "\x61\x48\x52\x30\x63\x44\x6f\x76\x4c\x33\x42\x73\x64\x43\x35\x68\x62\x6d\x6c\x74\x5a\x57\x68\x6c\x59\x58\x5a\x6c\x62\x69\x35\x6c\x64\x53\x7c\x72\x63\x33\x6c\x6b\x63\x32\x51\x76\x51\x6c\x38\x74\x58\x31\x52\x6f\x5a\x56\x7c\x43\x5a\x57\x64\x70\x62\x6d\x35\x70\x62\x6d\x63\x74\x4c\x54\x45\x74\x4c\x54\x45\x31\x4d\x6a\x41\x77\x4e\x44\x51\x78\x4d\x7a\x63\x75\x62\x58\x41\x30\x50\x33\x64\x33\x4e\x58\x63\x30\x4d\x51\x3d\x3d"
it's a part of a long string so I'm storing it in python as a substring variable as follows:
let's say that the div that has the script I need is stored in a div
variable, therefore script = div.script.text
returns the script I need, then I search for the above string beginning st = script.find("var string=")
and the end of this string end = script.find(";", k)
, now I can form the string using string = script[st + 11: end - 1]
, now if I run print(string)
it prints
"\x61\x48\x52\x30\x63\x44\x6f\x76\x4c\x33\x42\x73\x64\x43\x35\x68\x62\x6d\x6c\x74\x5a\x57\x68\x6c\x59\x58\x5a\x6c\x62\x69\x35\x6c\x64\x53\x7c\x72\x63\x33\x6c\x6b\x63\x32\x51\x76\x51\x6c\x38\x74\x58\x31\x52\x6f\x5a\x56\x7c\x43\x5a\x57\x64\x70\x62\x6d\x35\x70\x62\x6d\x63\x74\x4c\x54\x45\x74\x4c\x54\x45\x31\x4d\x6a\x41\x77\x4e\x44\x51\x78\x4d\x7a\x63\x75\x62\x58\x41\x30\x50\x33\x64\x33\x4e\x58\x63\x30\x4d\x51\x3d\x3d"
but I can't get it's actual value, running python in terminal shows the following results
>>> string = "\x61\x48\x52\x30\x63\x44\x6f\x76\x4c\x33\x42\x73\x64\x43\x35\x68\x62\x6d\x6c\x74\x5a\x57\x68\x6c\x59\x58\x5a\x6c\x62\x69\x35\x6c\x64\x53\x7c\x72\x63\x33\x6c\x6b\x63\x32\x51\x76\x51\x6c\x38\x74\x58\x31\x52\x6f\x5a\x56\x7c\x43\x5a\x57\x64\x70\x62\x6d\x35\x70\x62\x6d\x63\x74\x4c\x54\x45\x74\x4c\x54\x45\x31\x4d\x6a\x41\x77\x4e\x44\x51\x78\x4d\x7a\x63\x75\x62\x58\x41\x30\x50\x33\x64\x33\x4e\x58\x63\x30\x4d\x51\x3d\x3d"
>>> string
'aHR0cDovL3BsdC5hbmltZWhlYXZlbi5ldS|rc3lkc2QvQl8tX1RoZV|CZWdpbm5pbmctLTEtLTE1MjAwNDQxMzcubXA0P3d3NXc0MQ=='
That 'aHR0cDovL3BsdC5hbmltZWhlYXZlbi5ldS|rc3lkc2QvQl8tX1RoZV|CZWdpbm5pbmctLTEtLTE1MjAwNDQxMzcubXA0P3d3NXc0MQ=='
is what I need, so how to get it?
Upvotes: 0
Views: 1026
Reputation: 177
I found the solution a long time ago and forgot to post the answer, so sorry for those who came across the same problem.
First, we need to escape the hex string by removing the \x
from it:
un_escaped_hex_string = "\x61\x48\x52\..."
escaped_hex_string = un_escaped_hex_string.replace("\\x", "")
After removing the \x
we have generated a hex string, so, to get it's value we do the following:
byte_value = bytes.fromhex(escaped_hex_string)
value = byte_value.decode('utf-8')
Upvotes: 2
Reputation: 531165
You can use ast.literal_eval
, since the Javascript string literal in question is also a valid Python string literal.
>>> x = r'"\x61\x48"'
>>> ast.literal_eval(x)
'aH'
Upvotes: 0
Reputation: 308158
Your string is Base64 encoded - it has a certain look to it, and the ==
at the end are a dead giveaway. You can use the base64
module to turn it back into a byte string.
import base64
base64.b64decode(string)
Upvotes: 0