user3295674
user3295674

Reputation: 913

Python mapper for adjacent pairs

I am trying to write a mapreduce program, this is the map part, that returns bigrams or adjacent word pairs from a stdin text.

This is my concept/half-pseudo:

for line in sys.stdin:
    line = line.strip()
    words = line.split()

    for pair in words: #HERE***
        print '%s\t%s' % (pair,1)

How can I extract an adjacent pair of words so that I can output all the adjacent word pairs such as "word1 word2, 1" so that in my reducer I can combine them? I'd like to keep the format as close to this as possible.

Thank you.

Upvotes: 0

Views: 156

Answers (1)

Nafiul Islam
Nafiul Islam

Reputation: 82450

You can pair them like so:

from itertools import tee

def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return zip(a, b)

Upvotes: 2

Related Questions