Reputation: 21
I have a problem with text encoding.
The context: I'm working on a django 1.11, python 3.6 app (it started in python 2.7 and django < 1.11 and was upgraded later).
I have to use an API that doesn't recognize my text when there are characters with accents (é-è-^).
The API works fine and accepts these characters when I use a script in python 3.9 (so it's not incriminating). The API also accepts a call from my app with text without accents. So the payload and headers are not incriminated either.
The api developer told me that he'd already had this kind of problem (never with Django python) and that it was due to my text not being encoded correctly in UTF-8.
I tried adding :
# coding: utf-8
from __future__ import unicode_literals
(in all the files that could see my message) without success
In the environment I've find:
LANG=C.UTF-8 (I tried en_US.UTF-8, without success)
I tried to add:
LC_CTYPE=UTF-8 (without success)
DEFAULT_CHARSET is not defined in the app, so it would be UTF-8 by default.
If you've experienced this or have any ideas, I'd love to hear from you. Thanks
Edit: Send class look like:
class MySMSAdapter:
self.message = 'One message with accent é not working'
self.recipient = ["+33600000000"]
def login():
#PRIVATE CODE
def construct_messages(self) -> list:
return [{'to': recipient, 'body': self.message } for recipient in self.recipients]
def get_payload(self) -> str:
payload = f"""{{ "accountreference": '{self.account_reference}',
"messages": {self.construct_messages()},
"from": '{self.sender_sms_senderId}',
"characterset": "Unicode",
}}
"""
return payload
def get_send_headers(self) -> dict:
return {
"Authorization": f"Basic {self.session_key}",
"Content-type": "application/json",
"Accept": "application/json",
"charset": "utf-8",
}
def send(self) -> requests.Response:
self.login()
return requests.post(url=f"{BASEURL}v1.0/messagedispatcher", headers=self.headers, data=self.payload)
Try :
def get_payload(self) -> str:
payload = json.dumps({
"accountreference": self.account_reference,
"messages": self.construct_messages(),
"from": self.sender_sms_senderId,
"characterset": "Unicode"}
)
return payload
The send message is transformed into : 'One message with accent \xc3\xa9 not working' And the API decode this correctly Closed
Upvotes: 0
Views: 47
Reputation: 521994
You're cobbling together JSON by hand, which is a bad idea, because it's error prone and values with special characters won't be correctly encoded/escaped. For this particular question, it also means you're passing a plain string to requests.post
. requests
will need to encode that to bytes to send it, and by default it'll encode it to Latin-1. If your other server is expecting UTF-8, that's where the problem is.
You should pass a dict to request.post(json=...)
, and requests
will encode it to JSON for you, using UTF-8 as expected by the JSON spec:
payload = {'messages': self.construct_messages(), ...}
requests.post(json=payload, ...)
Alternatively, if you pass an already formed JSON string to requests.post(data=...)
, you must ensure the encoding is correct yourself:
payload = '{"messages": ...}'
requests.post(data=payload.encode('utf-8'), ...)
Upvotes: 1