user9797
user9797

Reputation: 33

bigram of a adjacent sentence in python

Let's say I have three sentences:

  1. hello world

  2. hello python

  3. today is tuesday

If I generate bigrams of each string it would generate something like this:

[('hello', 'world')]
[('this', 'is'), ('is', 'python')]
[('today', 'is'), ('is', 'tuesday')]

Is there a difference between bigrams for a sentence and bigrams for two consecutive sentences? For example, hello world. hello python is two consecutive sentences. Will bigrams for these two consecutive sentences look like my output?

The code that produced it:

from itertools import tee, izip

def bigrams(iterable):
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)

with open("hello.txt", 'r') as f:
    for line in f:
        words = line.strip().split()
        bi = bigrams(words)
        print list(bi)

Upvotes: 0

Views: 883

Answers (1)

Constantinius
Constantinius

Reputation: 35059

but if i want to generate bigrams for the adjacent sentences will it give the same result as the above output. if not what would the output look like?

It depends what you want. If you define the items of your bigrams to be a whole sentence, it would look like this:

[('hello world', 'this is python'),('this is python', 'today is tuesday')]

If you want the bigrams where the type of an item is a word, for all sentences this would look like this:

[('hello', 'world'), ('world', 'this'), ('this', 'is'),...]

Upvotes: 1

Related Questions