Working with strings seems more cumbersome than it needs to be in Python 3.x

Question

I have a function that takes in a string, sends it via a socket, and prints it to the console. Sending strings to this function yields some warnings that turn into other warnings when attempting to fix them.

Function:

def log(socket, sock_message):
    sock_message = sock_message.encode()
    socket.send(sock_message)
    print(sock_message.decode())

I'm attempting to call my function this way:

log(conn, "BATT " + str(random.randint(1, 100)))

And also, for simplicity:

log(conn, "SIG: 100%")

With both of the log calls, I get Type 'str' doesn't have expected attribute 'decode'. So instead, I saw you could pass a string as an array of bytes with bytes("my string", 'utf-8') but then I get the warning Type 'str' doesn't have expected attribute 'encode'.

I'm 100% sure I'm just missing some key bit of information on how to pass strings around in python, so what's the generally accepted way to accomplish this?

EDIT: As explained below, an str can't have both decode and encode and I'm confusing my IDE by doing both on the same variable. I fixed it by maintaining a separate variable for the bytes version, and this fixes the issue.

def log(sock, msg):
    sock_message = msg.encode()
    sock.send(sock_message)
    print(sock_message.msg())

dsh · Accepted Answer

In Python 2 you could be very sloppy (and sometimes get away with it) when handling characters (strings) and handling bytes. Python 3 fixes this by making them two separate types: str and bytes.

You encode to convert from str to bytes. Many characters (in particular ones not in English / US-ASCII) require two or more bytes to represent them (in many encodings).

You decode to convert from bytes to str.

Thus you can't decode a str. You need to encode it to print it or to send it anywhere that requires bytes (files, sockets, etc.). You also need to use the correct encoding so that the receiver of the bytes can correctly decode it and receive the correct characters. For some US-ASCII is sufficient. Many prefer using UTF-8, in part because all the characters that can be handled by US-ASCII are the same in UTF-8 but UTF-8 can handle (other) Unicode characters.

Working with strings seems more cumbersome than it needs to be in Python 3.x

Answers (2)

Related Questions