alfredodeza
alfredodeza

Reputation: 5188

Efficient and clean way of growing a tokenizer function in Python

I have a library that does some "translation" and uses the awesome tokenize.generate_tokens() function to do so.

And it is pretty fast and I have things working correctly. But when translating, I've found that the function keeps growing with new tokens that I want to translate and the if and elif conditions start to pop all over. I also keep a few variables outside the generator that keeps track of "last keyword seen" and similar.

A good example of this is the actual Python documentation one seen here (at the bottom): http://docs.python.org/library/tokenize.html#tokenize.untokenize

Every time I add a new thing I need to translate this function grows a couple of conditionals. I don't think that having a function with so many conditionals is the way to or the proper way to pave the ground to grow.

Furthermore, I feel that the tokenizer consumes a lot of irrelevant lines that do not contain any of the keywords I am translating.

So 2 questions:

  1. How can I avoid adding more and more conditional statements that will make this translation function easy/clean to keep growing (without a performance hit)?

  2. How can I make it efficient for all the irrelevant lines I am not interested in?

Upvotes: 1

Views: 336

Answers (1)

unutbu
unutbu

Reputation: 880239

You could use a dict dispatcher. For example, the code you linked to might look like this:

def process_number(result,tokval):
    if '.' in tokval:
        result.extend([
            (NAME, 'Decimal'),
            (OP, '('),
            (STRING, repr(tokval)),
            (OP, ')')
            ])
def process_default(result,tokval):
    result.append((toknum, tokval))

dispatcher={NUMBER: process_number, }
for toknum, tokval, _, _, _  in g:
    dispatcher.get(toknum,process_default)(result,tokval)

Instead of adding more if-blocks, you add key-value pairs to dispatcher.

This may be more efficient than evaluating a long list of if-else conditionals, since dict lookup is O(1), but it does require a function call. You'll have to benchmark to see how this compares to many if-else blocks.

I think its main advantage is that it keeps code organized in small(er), comprehensible units.

Upvotes: 3

Related Questions