Xayden Rosario
Xayden Rosario

Reputation: 177

Converting a byte non-escaped hex string to a string

I'm scrapping a website script and I came across a variable var string = "\x61\x48\x52\x30\x63\x44\x6f\x76\x4c\x33\x42\x73\x64\x43\x35\x68\x62\x6d\x6c\x74\x5a\x57\x68\x6c\x59\x58\x5a\x6c\x62\x69\x35\x6c\x64\x53\x7c\x72\x63\x33\x6c\x6b\x63\x32\x51\x76\x51\x6c\x38\x74\x58\x31\x52\x6f\x5a\x56\x7c\x43\x5a\x57\x64\x70\x62\x6d\x35\x70\x62\x6d\x63\x74\x4c\x54\x45\x74\x4c\x54\x45\x31\x4d\x6a\x41\x77\x4e\x44\x51\x78\x4d\x7a\x63\x75\x62\x58\x41\x30\x50\x33\x64\x33\x4e\x58\x63\x30\x4d\x51\x3d\x3d"

it's a part of a long string so I'm storing it in python as a substring variable as follows:

let's say that the div that has the script I need is stored in a div variable, therefore script = div.script.text returns the script I need, then I search for the above string beginning st = script.find("var string=") and the end of this string end = script.find(";", k), now I can form the string using string = script[st + 11: end - 1], now if I run print(string) it prints

"\x61\x48\x52\x30\x63\x44\x6f\x76\x4c\x33\x42\x73\x64\x43\x35\x68\x62\x6d\x6c\x74\x5a\x57\x68\x6c\x59\x58\x5a\x6c\x62\x69\x35\x6c\x64\x53\x7c\x72\x63\x33\x6c\x6b\x63\x32\x51\x76\x51\x6c\x38\x74\x58\x31\x52\x6f\x5a\x56\x7c\x43\x5a\x57\x64\x70\x62\x6d\x35\x70\x62\x6d\x63\x74\x4c\x54\x45\x74\x4c\x54\x45\x31\x4d\x6a\x41\x77\x4e\x44\x51\x78\x4d\x7a\x63\x75\x62\x58\x41\x30\x50\x33\x64\x33\x4e\x58\x63\x30\x4d\x51\x3d\x3d"

but I can't get it's actual value, running python in terminal shows the following results

>>> string = "\x61\x48\x52\x30\x63\x44\x6f\x76\x4c\x33\x42\x73\x64\x43\x35\x68\x62\x6d\x6c\x74\x5a\x57\x68\x6c\x59\x58\x5a\x6c\x62\x69\x35\x6c\x64\x53\x7c\x72\x63\x33\x6c\x6b\x63\x32\x51\x76\x51\x6c\x38\x74\x58\x31\x52\x6f\x5a\x56\x7c\x43\x5a\x57\x64\x70\x62\x6d\x35\x70\x62\x6d\x63\x74\x4c\x54\x45\x74\x4c\x54\x45\x31\x4d\x6a\x41\x77\x4e\x44\x51\x78\x4d\x7a\x63\x75\x62\x58\x41\x30\x50\x33\x64\x33\x4e\x58\x63\x30\x4d\x51\x3d\x3d"
>>> string
'aHR0cDovL3BsdC5hbmltZWhlYXZlbi5ldS|rc3lkc2QvQl8tX1RoZV|CZWdpbm5pbmctLTEtLTE1MjAwNDQxMzcubXA0P3d3NXc0MQ=='

That 'aHR0cDovL3BsdC5hbmltZWhlYXZlbi5ldS|rc3lkc2QvQl8tX1RoZV|CZWdpbm5pbmctLTEtLTE1MjAwNDQxMzcubXA0P3d3NXc0MQ==' is what I need, so how to get it?

Upvotes: 0

Views: 1026

Answers (3)

Xayden Rosario
Xayden Rosario

Reputation: 177

I found the solution a long time ago and forgot to post the answer, so sorry for those who came across the same problem.

First, we need to escape the hex string by removing the \x from it:

un_escaped_hex_string = "\x61\x48\x52\..."
escaped_hex_string = un_escaped_hex_string.replace("\\x", "")

After removing the \x we have generated a hex string, so, to get it's value we do the following:

byte_value = bytes.fromhex(escaped_hex_string)
value = byte_value.decode('utf-8')

Upvotes: 2

chepner
chepner

Reputation: 531165

You can use ast.literal_eval, since the Javascript string literal in question is also a valid Python string literal.

>>> x = r'"\x61\x48"'
>>> ast.literal_eval(x)
'aH'

Upvotes: 0

Mark Ransom
Mark Ransom

Reputation: 308158

Your string is Base64 encoded - it has a certain look to it, and the == at the end are a dead giveaway. You can use the base64 module to turn it back into a byte string.

import base64
base64.b64decode(string)

Upvotes: 0

Related Questions