Reputation: 1
So I have a problem to find sentances containing certain words from text and outputting those sentances with their indexes (I mean sentance number in a text)
Using NLTK library I made my text to separate on sentances and outup certain I need:
Code:
from nltk.tokenize import sent_tokenize, word_tokenize
text = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum."
search_words = ["Ipsum", "Aldus"]
matches = []
sentances = sent_tokenize(text)
for word in search_words:
for sentance in sentances:
if word in sentance:
matches.append(sentance)
print(matches)
Also using len I got overall sentances' number, But I can't make them output their indexes, when I trying to use .index:
index = sentances.index(matches)
print(index)
If anybody know how to resolve it?
I've tried to get indexes of certain sentances
Upvotes: 0
Views: 111
Reputation: 393
The index method takes one search object, not a list. All you need to do is this.
for match in matches:
print(sentences.index(match))
Depending on your use case you might also want to try something like this.
sentences = sent_tokenize(text)
search_words = ["Ipsum", "Aldus"]
for word in search_words:
for index, sentence in enumerate(sentences):
if word in sentence:
print(index)
Upvotes: 1