Julia
Julia

Reputation: 1409

python code explanation

I have been studying this code to generate random text:

from collections import defaultdict, Counter
from itertools import ifilter
from random import choice, randrange

def pairwise(iterable):
    it = iter(iterable)
    last = next(it)
    for curr in it:
        yield last, curr
        last = curr

valid = set('abcdefghijklmnopqrstuvwxyz ')

def valid_pair((last, curr)):
    return last in valid and curr in valid

def make_markov(text):
    markov = defaultdict(Counter)
    lowercased = (c.lower() for c in text)
    for p, q in ifilter(valid_pair, pairwise(lowercased)):
        markov[p][q] += 1
    return markov

def genrandom(model, n):
    curr = choice(list(model))
    for i in xrange(n):
        yield curr
        if curr not in model:   # handle case where there is no known successor
            curr = choice(list(model))
        d = model[curr]
        target = randrange(sum(d.values()))
        cumulative = 0
        for curr, cnt in d.items():
            cumulative += cnt
            if cumulative > target:
                break

model = make_markov('The qui_.ck brown fox')
print ''.join(genrandom(model, 20))

However i am having trouble understanding the last bit, from target = randrange(sum(d.values())) onwards. An explanation would be greatly appreciated! Thanks!

Upvotes: 0

Views: 160

Answers (1)

HardlyKnowEm
HardlyKnowEm

Reputation: 3230

target = randrange(sum(d.values()))

d.values() Since model is a dictionary mapping letters to counter object, and a counter object is a dictionary, d.values() is a list of all the counts for each key in the dictionary (but doesn't return the keys). This means sum(d.values()) will return the total of all the counts. randrange() choose a value within [0, result) where result was the value of sum(d.values()).

d.items() returns key, value pairs for every item in the given counts dictionary. The code is trying to assign a probability to each letter and then choose a letter. If the counts are ('a', 5), ('b', 7), and ('c', 2), then the total number of counts is 14. The code chooses a random number between 0 and 13 (inclusive). If the result is in [0, 5), it will return 'a', if the result is in [5, 12), it will return 'b', and if the result is in [12, 14), it will return 'c'. The relative probabilities are determined by the width of those ranges, and the width of the ranges are determined by the counts determined in make_markov.

Upvotes: 2

Related Questions