Kris
Kris

Reputation: 187

Pythonic way of getting all consecutive 2-tuples from list

I have a sentence as a list of words, and I'm trying to extract all the bigrams (i.e. all the consecutive 2-tuples of words) from it. So, if my sentence was

['To', 'sleep', 'perchance', 'to', 'dream']

I want to get back out

[('To', 'sleep'), ('sleep', 'perchance'), ('perchance', 'to'), ('to', 'dream')]

Currently, I'm using

zip([sentence[i] for i in range(len(sentence) - 1)], [sentence[i+1] for i in range(len(sentence) - 1)] and then iterating over this, but I can't help thinking there are more Pythonic ways of doing this.

Upvotes: 8

Views: 1129

Answers (3)

wim
wim

Reputation: 362557

Here's one I prepared earlier. It's from the itertools recipes section in the official python docs.

from itertools import tee

def pairwise(iterable):
    """Iterate in pairs

    >>> list(pairwise([0, 1, 2, 3]))
    [(0, 1), (1, 2), (2, 3)]
    >>> tuple(pairwise([])) == tuple(pairwise('x')) == ()
    True
    """
    a, b = tee(iterable)
    next(b, None)
    return zip(a, b)

Upvotes: 2

Cory Kramer
Cory Kramer

Reputation: 117856

Same idea but using slicing instead of indexing with range

>>> l =['To', 'sleep', 'perchance', 'to', 'dream']
>>> list(zip(l, l[1:]))
[('To', 'sleep'), ('sleep', 'perchance'), ('perchance', 'to'), ('to', 'dream')]

Upvotes: 0

Kevin
Kevin

Reputation: 76194

You're on the right track with zip. I suggest using list slicing instead of comprehensions.

seq = ['To', 'sleep', 'perchance', 'to', 'dream']
print zip(seq, seq[1:])

Result:

[('To', 'sleep'), ('sleep', 'perchance'), ('perchance', 'to'), ('to', 'dream')]

Note that the arguments to zip don't have to be the same length, so it's fine that seq is longer than seq[1:].

Upvotes: 9

Related Questions