Reputation: 761
I have a binary like this:
1101100110000110110110011000001011011000101001111101100010101000
and I want to convert it to utf-8. how can I do this in python?
Upvotes: 12
Views: 60665
Reputation: 10787
Cleaner version:
>>> test_string = '1101100110000110110110011000001011011000101001111101100010101000'
>>> print ('%x' % int(test_string, 2)).decode('hex').decode('utf-8')
نقاب
Inverse (from @Robᵩ's comment):
>>> '{:b}'.format(int(u'نقاب'.encode('utf-8').encode('hex'), 16))
1: '1101100110000110110110011000001011011000101001111101100010101000'
Upvotes: 19
Reputation: 168646
>>> s='1101100110000110110110011000001011011000101001111101100010101000'
>>> print (''.join([chr(int(x,2)) for x in re.split('(........)', s) if x ])).decode('utf-8')
نقاب
>>>
Or, the inverse:
>>> s=u'نقاب'
>>> ''.join(['{:b}'.format(ord(x)) for x in s.encode('utf-8')])
'1101100110000110110110011000001011011000101001111101100010101000'
>>>
Upvotes: 3
Reputation: 29804
Well, the idea I have is:
1. Split the string into octets
2. Convert the octet to hexadecimal using int
and later chr
3. Join them and decode the utf-8 string into Unicode
This code works for me, but I'm not sure what does it print because I don't have utf-8 in my console (Windows :P ).
s = '1101100110000110110110011000001011011000101001111101100010101000'
u = "".join([chr(int(x,2)) for x in [s[i:i+8]
for i in range(0,len(s), 8)
]
])
d = u.decode('utf-8')
Hope this helps!
Upvotes: 4
Reputation: 889
Use:
def bin2text(s): return "".join([chr(int(s[i:i+8],2)) for i in xrange(0,len(s),8)])
>>> print bin2text("01110100011001010111001101110100")
>>> test
Upvotes: 1