PythonNewbie
PythonNewbie

Reputation: 1163

How to replace `set` using python

I'm currently facing a problem where I am giving a thread a reference to a set and I want to be able to replace the set with a mocked database call. I have so far done

import logging
import threading
import time
from typing import Callable

from loguru import logger


class MonitorProduct:

    def __init__(self, term: str, is_alive: Callable[[str], bool]) -> None:
        self.is_alive = is_alive
        self.term = term

    def do_request(self) -> None:
        time.sleep(.1)
        while True:
            logger.info(f'Checking {self.term}')
            if not self.is_alive(self.term):
                logger.info(f'Deleting term from monitoring: "{self.term}"')
                return

            time.sleep(5)


# mocked database
def database_terms() -> set[str]:
    return {
        'hello world',
        'python 3',
        'world',
        'wth',
    }


def database_terms_2() -> set[str]:
    return {
        'what am I doing wrong',
    }


def main() -> None:
    terms: set[str] = set()

    while True:
        db_terms = database_terms()
        diff = db_terms - terms
        terms.symmetric_difference_update(db_terms)

        for url in diff:
            logger.info(f'Starting URL: {url}')
            threading.Thread(
                target=MonitorProduct(url, terms.__contains__).do_request
            ).start()

        time.sleep(2)

        # ----------------------------------------------- #

        db_terms = database_terms_2() 
        diff = db_terms - terms
        terms.symmetric_difference_update(db_terms) # <--- terms should only now contain `what am I doing wrong`

        # Start the new URLS
        for url in diff:
            logger.info(f'Starting URL 2: {url}')
            threading.Thread(
                target=MonitorProduct(url, terms.__contains__).do_request
            ).start()

        time.sleep(10)


if __name__ == '__main__':
    main()

The problem I am now having is that when we do our first db call, it should start threads for each of terms:

{
  'hello world',
  'python 3',
  'world',
  'wth',
}

and as you can see we also send in a terms.__contains__ for each thread.

When we do the second call of db - that set should replace the terms to

{
  'what am I doing wrong',
}

which should end up exiting the four running threads due to:

def do_request(self) -> None:
        time.sleep(.1)
        while True:
            logger.info(f'Checking {self.term}')
            if not self.is_alive(self.term):
                logger.info(f'Deleting term from monitoring: "{self.term}"')
                return

            time.sleep(5)

however the problem is that we cannot replace terms by doing

terms = ... because we are creating a new set and then bidning that set to the variable terms while the thread still has a reference to the old set.

My question is, how can I replace the old terms with updating to the newest set without binding a new set?

Upvotes: 2

Views: 172

Answers (1)

Dan Getz
Dan Getz

Reputation: 9153

You're almost there. But

diff = db_terms - terms
terms ^= diff  # symmetric_difference_update()

Isn't enough, because that just adds the new values, so it's the same as

terms |= diff  # update()

or even

terms |= db_terms  # update()

(And one of these options should be clearer to the reader than the symmetric difference, because you're not using the symmetric difference to remove anything.)

To remove the old values, you want to also do

terms &= db_terms  # intersection_update()

You said you're concerned about race conditions with intermediate values of the set. If you'd want to modify the set from more than one thread, you should use a mutex lock (threading.RLock) around it. But if you're only modifying from one thread and comparing __contains__ in another, you can avoid a lock in CPython as long as each step of execution keeps your set in a consistent state.

Upvotes: 1

Related Questions