user9299756
user9299756

Reputation: 1

How to get an index for a certain sentance in python using nltk?

So I have a problem to find sentances containing certain words from text and outputting those sentances with their indexes (I mean sentance number in a text)

Using NLTK library I made my text to separate on sentances and outup certain I need:

Code:

from nltk.tokenize import sent_tokenize, word_tokenize
text = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum."
search_words = ["Ipsum", "Aldus"]
matches = []
sentances = sent_tokenize(text)
for word in search_words:
    for sentance in sentances:
        if word in sentance:
            matches.append(sentance)
print(matches)

Output

Also using len I got overall sentances' number, But I can't make them output their indexes, when I trying to use .index:

index = sentances.index(matches)
print(index)

I'm getting this

If anybody know how to resolve it?

I've tried to get indexes of certain sentances

Upvotes: 0

Views: 111

Answers (1)

RodP
RodP

Reputation: 393

The index method takes one search object, not a list. All you need to do is this.

for match in matches:
        print(sentences.index(match))

Depending on your use case you might also want to try something like this.

sentences = sent_tokenize(text)
search_words = ["Ipsum", "Aldus"]
for word in search_words:
    for index, sentence in enumerate(sentences):
        if word in sentence:
            print(index)

Upvotes: 1

Related Questions