Barbadoug
Barbadoug

Reputation: 21

Python - Django - Encoding UTF-8

I have a problem with text encoding. The context: I'm working on a django 1.11, python 3.6 app (it started in python 2.7 and django < 1.11 and was upgraded later).
I have to use an API that doesn't recognize my text when there are characters with accents (é-è-^). The API works fine and accepts these characters when I use a script in python 3.9 (so it's not incriminating). The API also accepts a call from my app with text without accents. So the payload and headers are not incriminated either.

The api developer told me that he'd already had this kind of problem (never with Django python) and that it was due to my text not being encoded correctly in UTF-8.

I tried adding :

# coding: utf-8
from __future__ import unicode_literals

(in all the files that could see my message) without success

In the environment I've find:

LANG=C.UTF-8 (I tried en_US.UTF-8, without success)

I tried to add:

LC_CTYPE=UTF-8 (without success)

DEFAULT_CHARSET is not defined in the app, so it would be UTF-8 by default.

If you've experienced this or have any ideas, I'd love to hear from you. Thanks

Edit: Send class look like:

class MySMSAdapter:

self.message = 'One message with accent é not working'
self.recipient = ["+33600000000"]

def login():
    #PRIVATE CODE

def construct_messages(self) -> list:
    return [{'to': recipient, 'body': self.message } for recipient in self.recipients]

def get_payload(self) -> str:
    payload = f"""{{ "accountreference": '{self.account_reference}', 
            "messages": {self.construct_messages()},
            "from": '{self.sender_sms_senderId}',
            "characterset": "Unicode",
            }}
            """
    return payload

def get_send_headers(self) -> dict:
    return {
        "Authorization": f"Basic {self.session_key}",
        "Content-type": "application/json",
        "Accept": "application/json",
        "charset": "utf-8",
    }

def send(self) -> requests.Response:
    self.login()
    return requests.post(url=f"{BASEURL}v1.0/messagedispatcher", headers=self.headers, data=self.payload)

Try :

    def get_payload(self) -> str:
    payload = json.dumps({
        "accountreference": self.account_reference,
        "messages": self.construct_messages(),
        "from": self.sender_sms_senderId,
        "characterset": "Unicode"}
    )
    return payload

The send message is transformed into : 'One message with accent \xc3\xa9 not working' And the API decode this correctly Closed

Upvotes: 0

Views: 47

Answers (1)

deceze
deceze

Reputation: 521994

You're cobbling together JSON by hand, which is a bad idea, because it's error prone and values with special characters won't be correctly encoded/escaped. For this particular question, it also means you're passing a plain string to requests.post. requests will need to encode that to bytes to send it, and by default it'll encode it to Latin-1. If your other server is expecting UTF-8, that's where the problem is.

You should pass a dict to request.post(json=...), and requests will encode it to JSON for you, using UTF-8 as expected by the JSON spec:

payload = {'messages': self.construct_messages(), ...}
requests.post(json=payload, ...)

Alternatively, if you pass an already formed JSON string to requests.post(data=...), you must ensure the encoding is correct yourself:

payload = '{"messages": ...}'
requests.post(data=payload.encode('utf-8'), ...)

Upvotes: 1

Related Questions