shaun
shaun

Reputation: 570

How to set up decorator that takes in argument provided by the function caller?

I have several functions that work on strings. While they take different kinds of arguments, they all take one common argument called tokenizer_func (defaulted to str.split) which basically splits the input string into a list of tokens according to the provided function. The list of strings that are returned are then modified in each function. Since the tokenizer_func seems to be a common argument and is the very first line of code that is present in all the functions, I was wondering if it would be easier to use a decorator to decorate the string modification functions. Basically, the decorator would taken the tokenizer_func, apply it to the incoming string and call the appropriate string modification function.

Edit-2

I was able to find a solution (maybe hacky?):

def tokenize(f):
  def _split(text, tokenizer=SingleSpaceTokenizer()):      
    return tokenizer.decode(f(tokenizer.encode(text)))
  return _split

@tokenize
def change_first_letter(token_list, *_):
  return [random.choice(string.ascii_letters) + token[1:] for token in token_list]

This way I can call change_first_letter(text) to use the default tokenizer and change_first_letter(text, new_tokenizer) to use the new_tokenizer. If there is a better way, please let me know.

Edit-1:

After viewing the first reply to this question, I thought I could generalize the problem I a bit better to handle more involved tokenizers. Specifically, I now have this:

class Tokenizer(ABC):
  """ 
  Base class for Tokenizer which provides the encode and decode methods
  """
  def __init__(self, tokenizer: Any) -> None:
    self.tokenizer = tokenizer

  @abstractmethod
  def encode(self, text: str) -> List[str]:
    """
    Tokenize a string into list of strings

    :param datum: Text to be tokenized
    :return: List of tokens
    """

  @abstractmethod
  def decode(self, token_list : List[str]) -> str:
    """
    Creates a string from a tokens list using the tokenizer

    :param data: List of tokens
    :return: Reconstructed string from token list
    """

  def encode_many(self, texts: List[str]) -> List[List[str]]:
    """
    Encode multiple strings

    :param data: List of strings to be tokenized
    :return: List of tokenized strings
    """    
    return [self.encode(text) for text in texts]

  def decode_many(self, token_lists: List[List[str]]) -> List[str]:
    """
    Decode multiple strings

    :param data: List of tokenized strings
    :return: List of reconstructed strings
    """        
    return [self.decode(token_list) for token_list in token_lists]

class SingleSpaceTokenizer(Tokenizer):
  """ 
  Simple tokenizer that just splits a string on a single space using str.split
  """
  def __init__(self, tokenizer=None) -> None:
    super(SingleSpaceTokenizer, self).__init__(tokenizer)

  def encode(self, text: str) -> List[str]:
    return text.split()    

  def decode(self, token_list: List[str]) -> str:
    return ' '.join(token_list)

I've written a decorator function based on a reply and search:

def tokenize(tokenizer):
  def _tokenize(f):
    def _split(text):      
      response = tokenizer.decode(f(tokenizer.encode(text)))
      return response
    return _split
  return _tokenize

Now I am able to do this:

@tokenize(SingleSpaceTokenizer())
def change_first_letter(token_list):
  return [random.choice(string.ascii_letters) + token[1:] for token in token_list]

This works without any problems. How lets I as a user want to use another tokenizer:

class AtTokenizer(Tokenizer):
  def __init__(self, tokenizer=None):
    super(AtTokenizer, self).__init__(tokenizer)
  
  def encode(self, text):
    return text.split('@')

  def decode(self, token_list):
    return '@'.join(token_list)

new_tokenizer = AtTokenizer()

How would I invoke my text functions by passing this new_tokenzer?

I found out that I can call this new_tokenizer like this:

tokenize(new_tokenizer)(change_first_letter)(text)

if I DO NOT decorate the change_first_letter function. This seems very tedious though? Is there a way to do this more concisely?

Original:

Here is an example of two such functions (the first one is a dummy function):

def change_first_letter(text: str, tokenizer_func: Callable[[str], List[str]]=str.split) -> str:
 words = tokenizer_func(text)
 return ' '.join([random.choice(string.ascii_letters) + word[1:] for word in words])

def spellcheck(text: str, tokenizer_func: Callable[[str], List[str]]=str.split) -> str:
 words = tokenizer_func(text)
 return ' '.join([SpellChecker().correction(word) for word in words])

As you can for both functions the first line is to apply the tokenizer function. If the tokenizer function is always str.split, the I could create a decorator that would do this for me:

def tokenize(func):
 def _split(text):
  return func(text.split())
 return _split

Then I could just decorate the other functions with @tokenize and it would work. In this case, the functions would directly take List[str]. However, the tokenizer_func is provided by the function caller. How would I pass this to the decorator? Can this be done?

Upvotes: 1

Views: 478

Answers (2)

Jasmijn
Jasmijn

Reputation: 10452

def tokenize(tokenizer):
  def _tokenize(f):
    def _split(text, tokenizer=tokenizer):      
      response = tokenizer.decode(f(tokenizer.encode(text)))
      return response
    return _split
  return _tokenize

That way you can call your change_first_letter in two ways:

  • change_first_letter(text) to use the default tokenizer
  • change_first_letter(text, new_tokenizer) to use new_tokenizer

MyPy doesn't like it when decorators change which parameters a function accepts, so if you're using MyPy you might want to write a plugin for it.

Upvotes: 1

Green Cloak Guy
Green Cloak Guy

Reputation: 24691

The @ syntax of a decorator simply evaluates the rest of the line as a function, calls that function on the function that's defined immediately afterwards, and replaces it as such. By making the 'decorator with arguments' (tokenize()) return a regular decorator, that decorator will then encompass the original function.

def tokenize(method):
    def decorator(function):
        def wrapper(text):
            return function(method(text))
        return wrapper
    return decorator

@tokenize(method=str.split)
def strfunc(text):
    print(text)

strfunc('The quick brown fox jumped over the lazy dog')
# ['The', 'quick', 'brown', 'fox', 'jumped', 'over', 'the', 'lazy', 'dog']

The problem with this is that, if you were to assign a default argument (e.g. def tokenize(method=str.split):), you'd still need to call it as a function when applying the decorator:

@tokenize()
def strfunc(text):
    ...

so it might be best to not give a default argument, or to find a creative way around this problem. One possible solution would be to change the decorator's behavior depending on whether it's called with a function (in which case it decorates that function) or a string (in which case it calls str.split()):

def tokenize(method):
    def decorator(arg):
        # if argument is a function, then apply another decorator
        # otherwise, assume str.split()
        if type(arg) == type(tokenize):
            def wrapper(text):
                return arg(method(text))
            return wrapper
        else:
            return method(str.split(arg))
    return decorator

which should allow both of the following:

@tokenize             # default to str.split
def strfunc(text):
    ...

@tokenize(str.split)  # or another function of your choice
def strfunc(text):
    ...

The downside to this is that it's a bit hacky (playing with type() always is, and the saving grace here in particular is that all functions are functions; you could instead see if you could do a check for "is callable", if you wanted it to apply to classes as well, maybe), and makes it hard to figure out which parameters are doing what inside tokenize() - since they change purposes depending on how the method is called.

Upvotes: 0

Related Questions