How to set up decorator that takes in argument provided by the function caller?

Question

I have several functions that work on strings. While they take different kinds of arguments, they all take one common argument called tokenizer_func (defaulted to str.split) which basically splits the input string into a list of tokens according to the provided function. The list of strings that are returned are then modified in each function. Since the tokenizer_func seems to be a common argument and is the very first line of code that is present in all the functions, I was wondering if it would be easier to use a decorator to decorate the string modification functions. Basically, the decorator would taken the tokenizer_func, apply it to the incoming string and call the appropriate string modification function.

Edit-2

I was able to find a solution (maybe hacky?):

def tokenize(f):
  def _split(text, tokenizer=SingleSpaceTokenizer()):      
    return tokenizer.decode(f(tokenizer.encode(text)))
  return _split

@tokenize
def change_first_letter(token_list, *_):
  return [random.choice(string.ascii_letters) + token[1:] for token in token_list]

This way I can call change_first_letter(text) to use the default tokenizer and change_first_letter(text, new_tokenizer) to use the new_tokenizer. If there is a better way, please let me know.

Edit-1:

After viewing the first reply to this question, I thought I could generalize the problem I a bit better to handle more involved tokenizers. Specifically, I now have this:

class Tokenizer(ABC):
  """ 
  Base class for Tokenizer which provides the encode and decode methods
  """
  def __init__(self, tokenizer: Any) -> None:
    self.tokenizer = tokenizer

  @abstractmethod
  def encode(self, text: str) -> List[str]:
    """
    Tokenize a string into list of strings

    :param datum: Text to be tokenized
    :return: List of tokens
    """

  @abstractmethod
  def decode(self, token_list : List[str]) -> str:
    """
    Creates a string from a tokens list using the tokenizer

    :param data: List of tokens
    :return: Reconstructed string from token list
    """

  def encode_many(self, texts: List[str]) -> List[List[str]]:
    """
    Encode multiple strings

    :param data: List of strings to be tokenized
    :return: List of tokenized strings
    """    
    return [self.encode(text) for text in texts]

  def decode_many(self, token_lists: List[List[str]]) -> List[str]:
    """
    Decode multiple strings

    :param data: List of tokenized strings
    :return: List of reconstructed strings
    """        
    return [self.decode(token_list) for token_list in token_lists]

class SingleSpaceTokenizer(Tokenizer):
  """ 
  Simple tokenizer that just splits a string on a single space using str.split
  """
  def __init__(self, tokenizer=None) -> None:
    super(SingleSpaceTokenizer, self).__init__(tokenizer)

  def encode(self, text: str) -> List[str]:
    return text.split()    

  def decode(self, token_list: List[str]) -> str:
    return ' '.join(token_list)

I've written a decorator function based on a reply and search:

def tokenize(tokenizer):
  def _tokenize(f):
    def _split(text):      
      response = tokenizer.decode(f(tokenizer.encode(text)))
      return response
    return _split
  return _tokenize

Now I am able to do this:

@tokenize(SingleSpaceTokenizer())
def change_first_letter(token_list):
  return [random.choice(string.ascii_letters) + token[1:] for token in token_list]

This works without any problems. How lets I as a user want to use another tokenizer:

class AtTokenizer(Tokenizer):
  def __init__(self, tokenizer=None):
    super(AtTokenizer, self).__init__(tokenizer)
  
  def encode(self, text):
    return text.split('@')

  def decode(self, token_list):
    return '@'.join(token_list)

new_tokenizer = AtTokenizer()

How would I invoke my text functions by passing this new_tokenzer?

I found out that I can call this new_tokenizer like this:

tokenize(new_tokenizer)(change_first_letter)(text)

if I DO NOT decorate the change_first_letter function. This seems very tedious though? Is there a way to do this more concisely?

Original:

Here is an example of two such functions (the first one is a dummy function):

def change_first_letter(text: str, tokenizer_func: Callable[[str], List[str]]=str.split) -> str:
 words = tokenizer_func(text)
 return ' '.join([random.choice(string.ascii_letters) + word[1:] for word in words])

def spellcheck(text: str, tokenizer_func: Callable[[str], List[str]]=str.split) -> str:
 words = tokenizer_func(text)
 return ' '.join([SpellChecker().correction(word) for word in words])

As you can for both functions the first line is to apply the tokenizer function. If the tokenizer function is always str.split, the I could create a decorator that would do this for me:

def tokenize(func):
 def _split(text):
  return func(text.split())
 return _split

Then I could just decorate the other functions with @tokenize and it would work. In this case, the functions would directly take List[str]. However, the tokenizer_func is provided by the function caller. How would I pass this to the decorator? Can this be done?

Jasmijn · Accepted Answer

def tokenize(tokenizer):
  def _tokenize(f):
    def _split(text, tokenizer=tokenizer):      
      response = tokenizer.decode(f(tokenizer.encode(text)))
      return response
    return _split
  return _tokenize

That way you can call your change_first_letter in two ways:

change_first_letter(text) to use the default tokenizer
change_first_letter(text, new_tokenizer) to use new_tokenizer

MyPy doesn't like it when decorators change which parameters a function accepts, so if you're using MyPy you might want to write a plugin for it.

How to set up decorator that takes in argument provided by the function caller?

Answers (2)

Related Questions