Reputation: 62666
Doing a DNS resolve on a unicode-hostname return the following:
'\195\164\195\182\195\188o.mydomain104.local.'
The \195\164
is actually the following unicode letter: Ä
(u'\xc4'
).
The original hostname is:
ÄÖÜO.mydomain104.local
I'm looking for a way to convert it back to the unicode string (in python2.7)
In case the original code is needed, it's something like the following:
from dns import resolver, reversename
from dns.exception import DNSException
def get_name(ip_address):
answer = None
res = resolver.Resolver()
addr = reversename.from_address(ip_address)
try:
answer = res.query(addr, "PTR")[0].to_text().decode("utf-8")
except DNSException:
pass
return answer
I was looking at both
.encode
and.decode
, theunicodedata
lib andcodecs
and found nothing that worked.
Upvotes: 0
Views: 298
Reputation: 880627
Clue #1:
In [1]: print(b'\xc3\xa4\xc3\xb6\xc3\xbc'.decode('utf_8'))
äöü
In [2]: print(bytearray([195,164,195,182,195,188]).decode('utf-8'))
'äöü'
Clue #2: Per the docs, Python interprets \ooo
as the ASCII character with octal value ooo
, and \xhh
as the ASCII character with hex value hh
.
Since 9 is not a valid octal number, '\195'
is interpreted as '\1'
and '95'
.
hex(195)
is '0xc3'
. So instead of '\195'
we want '\xc3'
.
We need to convert decimals after each backslash into the form \xhh
.
In Python2:
import re
given = r'\195\164\195\182\195\188o.mydomain104.local.'
# print(list(given))
decimals_to_hex = re.sub(r'\\(\d+)', lambda match: '\\x{:x}'.format(int(match.group(1))), given)
# print(list(decimals_to_hex))
result = decimals_to_hex.decode('string_escape')
print(result)
prints
äöüo.mydomain104.local.
In Python3, use codecs.escape_decode
instead of decode('string_escape')
:
import re
import codecs
given = rb'\195\164\195\182\195\188o.mydomain104.local.'
decimals_to_hex = re.sub(rb'\\(\d+)',
lambda match: ('\\x{:x}'.format(int(match.group(1)))).encode('ascii'), given)
print(codecs.escape_decode(decimals_to_hex)[0].decode('utf-8'))
prints the same result.
Upvotes: 4