Reputation: 4815
I have a string like this:
text = 'b\'"Bill of the one\\xe2\\x80\\x99s store wanted to go outside.\''
That is clearly meant to be byte formatted, however when I look at the object's type, it returns:
type(text)
<class 'str'>
I tried encoding at byte and then decoding, but this was the result:
text.encode("utf-8").decode("utf-8")
'b\'"Bill of the oneâ\x80\x99s store wanted to go outside.\''
How can I get the text properly formatted?
Upvotes: 1
Views: 81
Reputation: 8254
As another possible approach, it seems to me that the string you have is the result of calling repr
on a byte object. You can reverse a repr
by calling ast.literal_eval
:
>>> import ast
>>> x = b'test string'
>>> y = repr(x)
>>> y
"b'test string'"
>>> ast.literal_eval(y)
b'test string'
Or in your case:
>>> x = 'b\'"Bill of the one\\xe2\\x80\\x99s store wanted to go outside.\''
>>> import ast
>>> ast.literal_eval(x)
b'"Bill of the one\xe2\x80\x99s store wanted to go outside.'
Upvotes: 3
Reputation: 416
Why are you doing both encode and decode on the string object if you do so you will anyhow come to the same state (i.e) string, just encode that is sufficient.
text = 'b\'"Bill of the one\\xe2\\x80\\x99s store wanted to go outside.\''
type(text) #This will output <class 'str'>
Now, for byte object just make use of below snippet
byte_object=text.encode("utf-8")
type(byte_object) #This will output <class 'bytes'>
Upvotes: 1