tsaG1337
tsaG1337

Reputation: 1

Python socket: invalid start byte

I'm using Python sockets to receive a file. However, occasionally I get the following error:

Traceback (most recent call last):
  File "C:\Users\Sharkoon\Nextcloud\Elektronik\pythonProject\main.py", line 54, in receive_File
    received = client_socket.recv(BUFFER_SIZE).decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x94 in position 34: invalid start byte

Process finished with exit code 1

This happens somewhere at the 4 to 9th execution of the code. Could it be that I am trying to decode before the file has been fully sent?

The code is the following:

# device's IP address
SERVER_HOST = "0.0.0.0"
SERVER_PORT = 1337

# receive 4096 bytes each time
BUFFER_SIZE = 4096
SEPARATOR = "<SEPARATOR>"


def receive_File():
    # create the server socket
    # TCP socket
    s = socket.socket()
    # bind the socket to our local address
    s.bind((SERVER_HOST, SERVER_PORT))

    s.listen(5)
    print(f"[*] Listening as {SERVER_HOST}:{SERVER_PORT}")
    # accept connection if there is any
    client_socket, address = s.accept()
    # if below code is executed, that means the sender is connected
    print(f"[+] {address} is connected.")

    # receive the file infos
    # receive using client socket, not server socket
    received = client_socket.recv(BUFFER_SIZE).decode()
    filename, filesize = received.split(SEPARATOR)
    # remove absolute path if there is
    filename = os.path.basename(filename)
    # convert to integer
    filesize = int(filesize)

    # start receiving the file from the socket
    # and writing to the file stream
    progress = tqdm.tqdm(range(filesize), f"Receiving {filename}", unit="B", unit_scale=True, unit_divisor=1024)
    with open(filename, "wb") as f:
        while True:
            # read 1024 bytes from the socket (receive)
            bytes_read = client_socket.recv(BUFFER_SIZE)
            if not bytes_read:
                # nothing is received
                # file transmitting is done
                break
                # write to the file the bytes we just received
            f.write(bytes_read)
            # update the progress bar
            progress.update(len(bytes_read))


    # close the server socket
    s.close()
    return filename

The Code I am using is coming from here: https://www.thepythoncode.com/article/send-receive-files-using-sockets-python

Upvotes: 0

Views: 403

Answers (1)

Steffen Ullrich
Steffen Ullrich

Reputation: 123260

It is unknown what the server provides but based on the code you expect the transferred data to be filename, some separator, filesize and then the actual data.

received = client_socket.recv(BUFFER_SIZE).decode()
filename, filesize = received.split(SEPARATOR)

Here you assume that a single recv will return a buffer which contains filename, separator and filesize. Specifically you expect that the buffer will a) contain all of this and b) only this. This likely comes from the assumption that a single send on the server will match exactly your recv in the client.

This assumption is wrong. TCP is a byte stream, not a message protocol. recv(BUFFER_SIZE) will return up to BUFFER_SIZE bytes. This might be less than the expected data but it might also be more.

Specifically it might already contain binary data from the file, which can not be decoded as utf-8 - leading to:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x94 in position 34: invalid start byte

To fix you would need to know where the utf-8 encoded headers ends and where the binary data start. This might be done by prefixing the header in the server with a length or by adding another separator between filesize and file content.

Upvotes: 1

Related Questions