user7649186
user7649186

Reputation:

Why is this python code not thread-safe?

I handed this in as part a school assignment and the person who marked it mentioned that this section was not thread safe.

The assignment was to create a multithreaded socket server in python that accepted a number and returned the Fibonacci value of that number. My approach was to memoize the calculations by sharing a dictionary between each of the threads.

Here is the code (with error handling and such removed for brevity)

from socketserver import ThreadingMixIn, TCPServer, BaseRequestHandler


class FibonacciThreadedTCPServer(ThreadingMixIn, TCPServer):
    def __init__(self, server_address):
        TCPServer.__init__(self, server_address, FibonacciThreadedTCPRequestHandler, bind_and_activate=True)
        #this dictionary will be shared between all Request handlers
        self.fib_dict = {0: 0, 1: 1, 2: 1}


class FibonacciThreadedTCPRequestHandler(BaseRequestHandler):
    def handle(self):
        data = self.request.recv(1024).strip()
        num = int(data)
        result = self.calc_fib(self.server.fib_dict, num)
        ret = bytes(str(result) + '\n', 'ascii')
        self.request.sendall(ret)


    @staticmethod
    def calc_fib(fib_dict, n):
        """
        Calculates the fibonacci value of n using a shared lookup table and a linear calculation.
        """
        length = len(fib_dict)
        while length <= n:
            fib_dict[length] = fib_dict[length - 1] + fib_dict[length - 2]
            length = len(fib_dict)
        return fib_dict[n]

I understand that there are reads and writes occuring simultaneously in the calc_fib method, and normally that means that code is not thread-safe. However in this instance I think it's possible to prove that the code will always provide predictable results.

Is the fact that reads and writes can be occuring simultaneously enough for it to not be considered thread-safe? Or is something considered thread-safe if it always returns a reliabel result.

Why I think this code will always produce reliable results:

  1. A read will never occur on any given index in the dictionary until a write has occurred there.

  2. Any subsequent writes to any given index will contain the same number as the previous writes, so regardless of when the read/write sequence occurs it will always receive the same data.

I have tested this by added random sleeps between each operation and making requests with several hundred threads simultaneously and the right answer is already returned during my test.

Any thoughts or criticism would be appreciated. Thanks.

Upvotes: 7

Views: 5554

Answers (2)

ShadowRanger
ShadowRanger

Reputation: 155418

In this particular case, the GIL should keep your code safe, because:

  1. CPython built-in data structures are protected from actual damage (as opposed to merely incorrect behavior) by the GIL (dict in particular needs this guarantee, because class instances and non-local scope usually use dicts for attribute/name lookup, and without the GIL, simply reading the values would be fraught with danger)
  2. You're updating a cached value of the length then using it for the next set of operations instead of rechecking the length during the mutation; this can lead to repeated work (multiple threads see the old length and compute the new value repeatedly) but since the key is always set to the same value, it doesn't matter if they each set it independently
  3. You never delete from your cache (if you did, that cached length would bite you)

So in CPython, this should be fine. I can't make any guarantees for other Python interpreters though; without the GIL, if they implement their dict without internal locking, it's wholly possible a rehash operation triggered by a write in one thread could cause another thread to read from a dict in an inconsistent/unusable state.

Upvotes: 4

Solomon Slow
Solomon Slow

Reputation: 27115

First of all, why do you think that dictionaries are thread-safe? I did a quick search of Python3 documentation (I'm a Python novice myself), and I can not find any reassurance that two unsynchronized threads can safely update the same dictionary without corrupting the dictionary internals and maybe crashing the program.

I've been writing multi-threaded code in other languages since 1980-something, and I've learned never to trust that something is thread safe just because it acts that way when I test it. I want to see documentation that it's supposed to be thread safe. Otherwise, I throw a mutex around it.

Second of all, you are assuming that fib_dict[length - 1] and fib_dict[length - 2] will be valid. My experience with other programming languages says not to assume it. In other programming languages, (e.g., Java), when threads share data without synchronization, it is possible for one thread to see variable updates happen in a different order from the order in which some other thread performed them. E.g., it is theoretically possible for a Java thread that accesses a Map with no synchronization to see the size() of the Map increase before it sees the new values actually appear in the map. I will assume that something similar could happen in Python until somebody shows me official documentation that says otherwise.

Upvotes: 1

Related Questions