Reputation: 33
Let's say I have three sentences:
hello world
hello python
today is tuesday
If I generate bigrams of each string it would generate something like this:
[('hello', 'world')]
[('this', 'is'), ('is', 'python')]
[('today', 'is'), ('is', 'tuesday')]
Is there a difference between bigrams for a sentence and bigrams for two consecutive sentences? For example, hello world. hello python
is two consecutive sentences. Will bigrams for these two consecutive sentences look like my output?
The code that produced it:
from itertools import tee, izip
def bigrams(iterable):
a, b = tee(iterable)
next(b, None)
return izip(a, b)
with open("hello.txt", 'r') as f:
for line in f:
words = line.strip().split()
bi = bigrams(words)
print list(bi)
Upvotes: 0
Views: 883
Reputation: 35059
but if i want to generate bigrams for the adjacent sentences will it give the same result as the above output. if not what would the output look like?
It depends what you want. If you define the items of your bigrams to be a whole sentence, it would look like this:
[('hello world', 'this is python'),('this is python', 'today is tuesday')]
If you want the bigrams where the type of an item is a word, for all sentences this would look like this:
[('hello', 'world'), ('world', 'this'), ('this', 'is'),...]
Upvotes: 1