Reputation: 1445
I have a tuple with 3grams that looks like this:
from nltk import ngrams
test_data = ["this is all test data", "this not"]
three_gram_list = []
for data in test_data:
three_grams = ngrams(data.split(" "), 3)
for gram in three_grams:
three_gram_list.append(gram)
What I would like to do is to create a function that checks for each 3-gram whether words are used in the same tuple. Therefore I did the following:
def create_specific_trigram(three_grams, parameters1, parameters2):
condition1 = False
condition2 = False
for three in three_grams:
for num in range(1, 3):
if three[num] in parameters1:
condition1 = True
for num in range(1, 3):
if three[num] in parameters2:
condition2 = True
if condition1 and condition2:
print(three)
However I run it now with some parameters:
parameters1 = ("test", "testing")
parameters2 = ("data", "datas")
for sentence in test_data:
create_specific_trigram(three_grams, paramaters1, parameters2)
I get the following output.
('all', 'test', 'data')
('all', 'test', 'data')
However I am only looking for one output per sentence. So in this case:
('all', 'test', 'data')
Any thoughts on what changes I should apply?
Upvotes: 1
Views: 77
Reputation: 574
When launching the function create_specific_trigram
, you launch it for the same value of three_grams
, independent from sentence
.
Try this:
test_data = ["this is all test data", "this not"]
parameters1 = ("test", "testing")
parameters2 = ("data", "datas")
#============================================
#implementation of create_specific_trigram
# ...
#============================================
for sentence in test_data:
three_grams = ngrams(sentence.split(" "), 3)
create_specific_trigram(three_grams, paramaters1, parameters2)
Upvotes: 1