Reputation: 1
I'm using Python sockets to receive a file. However, occasionally I get the following error:
Traceback (most recent call last):
File "C:\Users\Sharkoon\Nextcloud\Elektronik\pythonProject\main.py", line 54, in receive_File
received = client_socket.recv(BUFFER_SIZE).decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x94 in position 34: invalid start byte
Process finished with exit code 1
This happens somewhere at the 4 to 9th execution of the code. Could it be that I am trying to decode before the file has been fully sent?
The code is the following:
# device's IP address
SERVER_HOST = "0.0.0.0"
SERVER_PORT = 1337
# receive 4096 bytes each time
BUFFER_SIZE = 4096
SEPARATOR = "<SEPARATOR>"
def receive_File():
# create the server socket
# TCP socket
s = socket.socket()
# bind the socket to our local address
s.bind((SERVER_HOST, SERVER_PORT))
s.listen(5)
print(f"[*] Listening as {SERVER_HOST}:{SERVER_PORT}")
# accept connection if there is any
client_socket, address = s.accept()
# if below code is executed, that means the sender is connected
print(f"[+] {address} is connected.")
# receive the file infos
# receive using client socket, not server socket
received = client_socket.recv(BUFFER_SIZE).decode()
filename, filesize = received.split(SEPARATOR)
# remove absolute path if there is
filename = os.path.basename(filename)
# convert to integer
filesize = int(filesize)
# start receiving the file from the socket
# and writing to the file stream
progress = tqdm.tqdm(range(filesize), f"Receiving {filename}", unit="B", unit_scale=True, unit_divisor=1024)
with open(filename, "wb") as f:
while True:
# read 1024 bytes from the socket (receive)
bytes_read = client_socket.recv(BUFFER_SIZE)
if not bytes_read:
# nothing is received
# file transmitting is done
break
# write to the file the bytes we just received
f.write(bytes_read)
# update the progress bar
progress.update(len(bytes_read))
# close the server socket
s.close()
return filename
The Code I am using is coming from here: https://www.thepythoncode.com/article/send-receive-files-using-sockets-python
Upvotes: 0
Views: 403
Reputation: 123260
It is unknown what the server provides but based on the code you expect the transferred data to be filename, some separator, filesize and then the actual data.
received = client_socket.recv(BUFFER_SIZE).decode() filename, filesize = received.split(SEPARATOR)
Here you assume that a single recv
will return a buffer which contains filename, separator and filesize. Specifically you expect that the buffer will a) contain all of this and b) only this. This likely comes from the assumption that a single send
on the server will match exactly your recv
in the client.
This assumption is wrong. TCP is a byte stream, not a message protocol. recv(BUFFER_SIZE)
will return up to BUFFER_SIZE bytes. This might be less than the expected data but it might also be more.
Specifically it might already contain binary data from the file, which can not be decoded as utf-8 - leading to:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x94 in position 34: invalid start byte
To fix you would need to know where the utf-8 encoded headers ends and where the binary data start. This might be done by prefixing the header in the server with a length or by adding another separator between filesize and file content.
Upvotes: 1